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25 BACKGROUND OF THE INVENTION 

The invention generally relates to nucleic acids and polypeptides encoded therefrom. 
More specifically, the invention relates to nucleic acids encoding cytoplasmic, nuclear, 
membrane bound, and secreted polypeptides, as well as vectors, host cells, antibodies, and 
recombinant methods for producing these nucleic acids and polypeptides. 
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SUMMARY OF THE INVENTION 

The invention is based in part upon the discovery of nucleic acid sequences encoding 
novel polypeptides. The novel nucleic acids and polypeptides are referred to herein as NOVX, 
5 or NOV1-NOV91 nucleic acids and polypeptides. These nucleic acids and polypeptides, as 
well as derivatives, homologs, analogs and fragments thereof, will hereinafter be collectively 
designated as "NOVX" nucleic acid or polypeptide sequences. 

In one aspect, the invention provides an isolated NOVX nucleic acid molecule 
encoding a NOVX polypeptide that includes a nucleic acid sequence that has identity to the 

10 nucleic acids disclosed in SEQ ID NOS: 2/7-1, wherein n is any integer between 1 and 107. In 
some embodiments, the NOVX nucleic acid molecule will hybridize under stringent 
conditions to a nucleic acid sequence complementary to a nucleic acid molecule that includes 
a protein-coding sequence of a NOVX nucleic acid sequence. The invention also includes an 
isolated nucleic acid that encodes a NOVX polypeptide, or a fragment, homolog, analog or 

1 5 derivative thereof. For example, the nucleic acid can encode a polypeptide at least 80% 

identical to a polypeptide comprising the amino acid sequences of SEQ ID NOS: 2n, where n 
is any integer between 1 and 107. The nucleic acid can be, for example, a genomic DNA 
fragment or a cDN A molecule that includes the nucleic acid sequence of any of SEQ ID 
NOS:2n-l. 

20 Also included in the invention is an oligonucleotide, e.g., an oligonucleotide which 

includes at least 6 contiguous nucleotides of a NOVX nucleic acid (e.g., SEQ ID NOS:2«-l) 
or a complement of said oligonucleotide. 

Also included in the invention are substantially purified NOVX polypeptides (SEQ ID 
NOS :2«). In certain embodiments, the NOVX polypeptides include an amino acid sequence 
25 that is substantially identical to the amino acid sequence of a human NOVX polypeptide. 

The invention also features antibodies that immunoselectively bind to NOVX 
polypeptides, or fragments, homologs, analogs or derivatives thereof. 

In another aspect, the invention includes pharmaceutical compositions that include 
therapeutically- or prophylactically-effective amounts of a therapeutic and a pharmaceutically- 
30 acceptable carrier. The therapeutic can be, e.g., a NOVX nucleic acid, a NOVX polypeptide, 
or an antibody specific for a NOVX polypeptide. In a further aspect, the invention includes, in 
one or more containers, a therapeutically- or prophylactically-effective amount of this 
pharmaceutical composition. 
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In a further aspect, the invention includes a method of producing a polypeptide by 
culturing a cell that includes a NOVX nucleic acid, under conditions allowing for expression 
of the NOVX polypeptide encoded by the DNA. If desired, the NOVX polypeptide can then 
be recovered. 

5 In another aspect, the invention includes a method of detecting the presence of a 

NOVX polypeptide in a sample. In the method, a sample is contacted with a compound that 
selectively binds to the polypeptide under conditions allowing for formation of a complex 
between the polypeptide and the compound. The complex is detected, if present, thereby 
identifying the NOVX polypeptide within the sample. 

10 The invention also includes methods to identify specific cell or tissue types based on 

their expression of a NOVX. 

Also included in the invention is a method of detecting the presence of a NOVX 
nucleic acid molecule in a sample by contacting the sample with a NOVX nucleic acid probe 
or primer, and detecting whether the nucleic acid probe or primer bound to a NOVX nucleic 

1 5 acid molecule in the sample. 

In a further aspect, the invention provides a method for modulating the activity of a 
NOVX polypeptide by contacting a cell sample that includes the NOVX polypeptide with a 
compound that binds to the NOVX polypeptide in an amount sufficient to modulate the 
activity of said polypeptide. The compound can be, e.g., a small molecule, such as a nucleic 

20 acid, peptide, polypeptide, peptidomimetic, carbohydrate, lipid or other organic (carbon 
containing) or inorganic molecule, as further described herein. 

Also within the scope of the invention is the use of a therapeutic in the manufacture of 
a medicament for treating or preventing disorders or syndromes including, e.g., 
cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, aortic stenosis, atrial 

25 septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary 

stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, hypercoagulation, 
hemophilia, idiopathic thrombocytopenic purpura, heart failure, secondary pathologies caused 
by heart failure and hypertension, hypotension, angina pectoris, myocardial infarction, 
tuberous sclerosis, scleroderma, transplantation, autoimmune disease, lupus erythematosus, 

30 viral/bacterial/parasitic infections, multiple sclerosis, autoimmume disease, allergies, 

immunodeficiencies, graft versus host disease, asthma, emphysema, ARDS, inflammation and 
modulation of the immune response, viral pathogenesis, aging-related disorders, Thl 
inflammatory diseases such as rheumatoid arthritis, multiple sclerosis, inflammatory bowel 
diseases, AIDS, wound repair, obesity, diabetes, endocrine disorders, anorexia, bulimia, renal 
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artery stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney disease, systemic, 
renal tubular acidosis, IgA nephropathy, nephrological disesases, hypercalcemia, Lesch- 
Nyhan syndrome, Von Hippel-Lindau (VHL) syndrome, trauma, regeneration (in vitro and in 
vivo), Hirschsprung's disease , Crohn's Disease, appendicitis, endometriosis, laryngitis, 
5 psoriasis, actinic keratosis, acne, hair growth/loss, allopecia, pigmentation disorders, 
myasthenia gravis, alpha-mannosidosis, beta-mannosidosis, other storage disorders, 
peroxisomal disorders such as Zellweger syndrome, infantile refsum disease, rhizomelic 
chondrodysplasia (chondrodysplasia punctata, rhizomelic), and hyperpipecolic acidemia, 
osteoporosis, muscle disorders, urinary retention, Albright Hereditary Ostoeodystrophy, 

10 ulcers, Alzheimer's disease, stroke, Parkinson's disease, Huntington's disease, cerebral palsy, 
epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, behavioral 
disorders, addiction, anxiety, pain, neuroprotection, Stroke, Aphakia, neurodegenerative 
disorders, neurologic disorders, developmental defects, conditions associated with the role of 
GRK2 in brain and in the regulation of chemokine receptors, encephalomyelitis, anxiety, 

1 5 schizophrenia, manic depression, delirium, dementia, severe mental retardation and 

dyskinesias, Gilles de la Tourette syndrome, leukodystrophies, cancers, breast cancer, CNS 
cancer, colon cancer, gastric cancer, lung cancer, melanoma, ovarian cancer, pancreatic 
cancer, kidney cancer, colon cancer, prostate cancer, neuroblastoma, and cervical cancer, 
Neoplasm; adenocarcinoma, lymphoma; uterus cancer, benign prostatic hypertrophy, fertility, 

20 control of growth and development/differentiation related functions such as but not limited 
maturation, lactation and puberty, reproductive malfunction, and/or other pathologies and 
disorders of the like. 

The therapeutic can be, e.g., a NOVX nucleic acid, a NOVX polypeptide, or a NOVX- 
specific antibody, or biologically-active derivatives or fragments thereof. 

25 For example, the compositions of the present invention will have efficacy for treatment 

of patients suffering from the diseases and disorders disclosed above and/or other pathologies 
and disorders of the like. The polypeptides can be used as immunogens to produce antibodies 
specific for the invention, and as vaccines. They can also be used to screen for potential 
agonist and antagonist compounds. For example, a cDNA encoding NOVX may be useful in 

30 gene therapy, and NOVX may be useful when administered to a subject in need thereof. By 
way of non-limiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from the diseases and disorders disclosed above and/or other 
pathologies and disorders of the like. 
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The invention further includes a method for screening for a modulator of disorders or 
syndromes including, e.g., the diseases and disorders disclosed above and/or other pathologies 
and disorders of the like. The method includes contacting a test compound with a NOVX 
polypeptide and determining if the test compound binds to said NOVX polypeptide. Binding 
5 of the test compound to the NOVX polypeptide indicates the test compound is a modulator of 
activity, or of latency or predisposition to the aforementioned disorders or syndromes. 

Also within the scope of the invention is a method for screening for a modulator of 
activity, or of latency or predisposition to disorders or syndromes including, e.g., the diseases 
and disorders disclosed above and/or other pathologies and disorders of the like by 

10 administering a test compound to a test animal at increased risk for the aforementioned 

disorders or syndromes. The test animal expresses a recombinant polypeptide encoded by a 
NOVX nucleic acid. Expression or activity of NOVX polypeptide is then measured in the test 
animal, as is expression or activity of the protein in a control animal which recombinantly- 
expresses NOVX polypeptide and is not at increased risk for the disorder or syndrome. Next, 

15 the expression of NOVX polypeptide in both the test animal and the control animal is 

compared. A change in the activity of NOVX polypeptide in the test animal relative to the 
control animal indicates the test compound is a modulator of latency of the disorder or 
syndrome. 

In yet another aspect, the invention includes a method for determining the presence of 
20 or predisposition to a disease associated with altered levels of a NOVX polypeptide, a NOVX 
nucleic acid, or both, in a subject (e.g., a human subject). The method includes measuring the 
amount of the NOVX polypeptide in a test sample from the subject and comparing the amount 
of the polypeptide in the test sample to the amount of the NOVX polypeptide present in a 
control sample. An alteration in the level of the NOVX polypeptide in the test sample as 
25 compared to the control sample indicates the presence of or predisposition to a disease in the 
subject. Preferably, the predisposition includes, e.g., the diseases and disorders disclosed 
above and/or other pathologies and disorders of the like. Also, the expression levels of the new 
polypeptides of the invention can be used in a method to screen for various cancers as well as 
to determine the stage of cancers. 
30 In a further aspect, the invention includes a method of treating or preventing a 

pathological condition associated with a disorder in a mammal by administering to the subject 
a NOVX polypeptide, a NOVX nucleic acid, or a NOVX-specific antibody to a subject (e.g., a 
human subject), in an amount sufficient to alleviate or prevent the pathological condition. In 
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preferred embodiments, the disorder, includes, e.g., the diseases and disorders disclosed above 

and/or other pathologies and disorders of the like. 

In yet another aspect, the invention can be used in a method to identity the cellular 

receptors and downstream effectors of the invention by any one of a number of techniques 
5 commonly employed in the art. These include but are not limited to the two-hybrid system, 

affinity purification, co-precipitation with antibodies or other specific-interacting molecules. 
Unless otherwise defined, all technical and scientific terms used herein have the same 

meaning as commonly understood by one of ordinary skill in the art to which this invention 

belongs. Although methods and materials similar or equivalent to those described herein can 
10 be used in the practice or testing of the present invention, suitable methods and materials are 

described below. All publications, patent applications, patents, and other references 

mentioned herein are incorporated by reference in their entirety. In the case of conflict, the 

present specification, including definitions, will control. In addition, the materials, methods, 

and examples are illustrative only and not intended to be limiting. 
1 5 Other features and advantages of the invention will be apparent from the following 

detailed description and claims. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel nucleotides and polypeptides encoded thereby. 
20 Included in the invention are the novel nucleic acid sequences and their encoded polypeptides. 
The sequences are collectively referred to herein as "NOVX nucleic acids" or "NOVX 
polynucleotides" and the corresponding encoded polypeptides are referred to as "NOVX 
polypeptides" or "NOVX proteins." Unless indicated otherwise, "NOVX" is meant to refer to 
any of the novel sequences disclosed herein. Table A provides a summary of the NOVX 
25 nucleic acids and their encoded polypeptides. 

TABLE A. Sequences and Corresponding SEQ ID Numbers 



NOVX 
Assignment 


Internal Identification 


SEQ ID 

NO 
(nucleic 
acid) 


SEQ ID NO 
(polypeptide) 


Homology 


1 


CG57602-01 


1 


2 


DJ0751H13.1 Protein 


2 


CG57558-01 


3 


4 


Mac25/IGFBP7 


3 


CG57560-01 


5 


6 


Calmodulin Binding Protein Kinase 


4A 


CG57547-01 


7 


8 


TRANSIENT RECEPTOR 
POTENTIAL-RELATED 
PROTEIN 




CG57547-02 






TRANSIENT RECEPTOR 
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9 


10 


POTENTIAL-RELATED 
PROTEIN 


5 


CG57609-01 


11 


12 


Epsin-3 


6 


CG576 11-01 


13 


14 


CD22, a B lymphocyte-restricted 
adhesion molecule 


7 


CG57595-01 


15 


16 


MEGF8 


8A 


CG57452-01 


17 


18 


protocadherin 


8B 


CG57452-02 


19 


20 


protocadherin 


9 


CG57625-01 


21 


22 


protocadherin 


10 


CG57553-01 


23 


24 


T01CL3 


11A 


CG57488 01 


25 


26 


alpha-macroglobulin 


11B 


CG57488 02 


27 


28 


alpha-macroglobulin 


11C 


CG57488 03 


29 


30 


alpha-macroglobulin 


12A 


CG57526 01 


31 


32 


orphan transporter 


12B 


CG57526-02 


33 


34 


orphan transporter 


13 


CG57570-01 


35 


36 


cation transporter 


14 


CG57593-01 


37 


38 


ABC transporter 


15 


CG57652-01 


39 


40 


Diacylglycerol Kinase Alpha 


16 


CG57562-01 


41 


42 


Cation-transporting ATPase 


17 


CG55914-01 


43 


44 


ACYL-COA DESATURASE 


18 


CG57328-01 


45 


46 


MYO-INOSITOL-l(OR 4)- 
MONOPHOSPHATASE 


19 


CG57358-01 


47 


48 


spinster 


20 


CG57695-01 


49 


50 


casein 


21 


CG57654-01 


51 


52 


Gamma-Aminobutyric-Acid 
Receptor Gamma-2 Subunit 
Precursor (GABA-A Receptor) 


22 


CG57724-01 


53 


54 


Carboxylesterase 


23 


CG57730-01 


55 


56 


MAT-1 oncogene 


24 


CG5755-01 


57 


58 


VACUOLAR PROTON-ATPASE 
SUBUNIT H 


25 


CG57503-01 


59 


60 


MEGF7 


26 


CG57456-01 


61 


62 


COP-COATED VESICLE 
MEMBRANE PROTEIN P24 
PRECURSOR 


27 


CG57658-01 


63 


64 


Connexin 


28 


CG57662-01 


65 


66 


Conn ex in 


29 


CG57664-01 


67 


68 


the MHC CLASS I ANTIGEN 


30 


CG57666-01 


69 


70 


the MHC CLASS I ANTIGEN 


31 


CG57668-01 


71 


72 


the MHC CLASS I ANTIGEN 


32 


CG57660-01 


73 


74 


RETINOIC ACID RECEPTOR 
RESPONDER 


33 


CG57672-01 


75 


76 


PHOSPHATIDYLINOSITOL 4- 
PHOSPHATE 5-KINASE 


34 


CG57680-01 


77 


78 


Cyclophi I in-type peptidyl-prolyl 
cis-trans isomerase 


35 


CG57670-01 


79 


80 


pyruvate kinase 


36A 


CG57149 01 


81 


82 


Cis/Trans Peptidyl Prolyl Isomerase 


36B 


CG57149 02 


83 


84 


Cis/Trans Peptidyl Prolyl Isomerase 


37 


CG57151JH 


85 


86 


Cis/Trans Peptidyl Prolyl Isomerase 


38 


CG57153 01 


87 


88 


Cis/Trans Peptidyl Prolyl Isomerase 


39 


CG57155 01 


89 


90 


Cis/Trans Peptidyl Prolyl Isomerase 


40 


CG57157 01 


91 


92 


Cis/Trans Peptidyl Prolyl Isomerase 


41 


CG57159 01 


93 


94 


Cis/Trans Peptidyl Prolyl Isomerase 


42A 


CG57226 01 


95 


96 


Cis/Trans Peptidyl Prolyl Isomerase 


42B 


CG57226-02 


97 


98 


Cis/Trans Peptidyl Prolyl Isomerase 


43 


CG57538-01 


99 


100 


Ceruloplasmin 


44A 


CG57623-01 


101 


102 


Leucine Rich Repeat 


44B 


CG57623-02 


103 


104 


Leucine Rich Repeat 
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45A 


CG57656-01 


105 


106 


Ig/Fibronectin domain 


45B 


CG57656-02 


107 


108 


Ig/Fibronectin domain 


46 


CG57682-01 


109 


110 


G2/MITOT1C-SPECIF1C CYCLIN 
B2 


47 


CG 57764-01 


111 


112 


ALR 


48 


CG577 13-01 


113 


114 


SODIUM/BILE ACID 
COTRANSPORTER 


49 


CG57721-01 


115 


116 


Prestin 


50 


CG57787 01 


117 


118 


Sulfate Transporter 


51 


CG57785 01 


119 


120 


Sulfate Transporter 


52 


CG57748-01 


121 


122 


N-acetylgalactosaminyltransferase 


53 


CG57693-01 


123 


124 


Protein Kinase 


54 


CG57707-01 


125 


126 


Leucine-rich glioma-inactivated 
protein precursor 


55 


CG57306-01 


127 


[ 128 


Anion exchanger 


56 


CG57348_01 


129 


130 


PR SET domain protein 


57 


CG57650-01 


131 


132 


NONMUSCLE MYOSIN HEAVY 
CHAIN B 


58 


CG57766-01 


133 


134 


Plasma retinol binding protein 


59 


CG57566-01 


135 


136 


HIV-1 inducer of short transcripts 
binding protein 


60A 


CG57574-01 


137 


138 


Beta tectorin 


60B 


CG57574-02 


139 


140 


Beta tectorin 


60C 


CG57574-03 


141 


142 


Beta tectorin 


61 


CG57505-01 


143 


144 


KIAA1125 


62A 


CG57473-01 


145 


146 


Zinc-finger protein BOP 


62B 


CG57473-02 


147 


148 


Zinc-finger protein BOP 


63 


CG57777 01 


149 


150 


Hypothetical secreted protein 


64 


CG57779 01 


151 


152 


Hypothetical secreted protein 


65 


CG57781 01 


153 


154 


Hypothetical secreted protein 


66 


G57783_01 


155 


156 


Hypothetical secreted protein 


67A 


CG57823-01 


157 


158 


ACYLTRANSFERASE 


67B 


CG57823-02 


159 


160 


ACYLTRANSFERASE 


68 


CG5780I-01 


161 


162 


guanine nucleotide exchange factor 


69A 


CG57719-01 


163 


164 


Aspartate Aminotransferase 


69B 


CG577 19-02 


165 


166 


Aspartate Aminotransferase 


70 


CG57462-01 


167 


168 


KIAA1337 


71 


CG57584-01 


169 


170 


ZONA PELLUCIDA 
GLYCOPROTEIN 3 PRECURSOR 


72 


CG56761-01 


171 


172 


Ankyrin repeat containing protein 


73 


CG57313 01 


173 


174 


GPCR 


74 


CG57315 01 


175 


176 


GPCR 


75 


CG57317 01 


177 


178 


GPCR 


76 


CG57321JH _j 


179 


180 


GPCR 


77 


CG57419 01 


181 


182 


GPCR 


78 


CG57425-01 


183 


184 


GPCR 


79 


CG57753-01 


185 


186 


GPCR 


80 


CG56766-01 


187 


188 


GPCR 


81A 


CG57847-01 


189 


190 


GPCR 


81B 


CG57847-02 


191 


192 


GPCR 


82 


CG57845-01 


193 


194 


GPCR 


83 


CG57843-01 


195 


196 


GPCR 


84A 


CG57841-01 


197 


198 


GPCR 


84B 


CG57841-02 


199 


200 


GPCR 


85 


CG57839-01 


201 


202 


GPCR 


86 


CG57837-01 


203 


204 


GPCR 


87 


CG56763-01 


205 


206 


GPCR 


88 


CG56753-01 


207 


208 


GPCR 


89 


CG57670-01 


209 


210 


GPCR 
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90 


CG57676-0I 


211 


212 


GPCR 


91 


CG57678-01 


213 


214 


GPCR 



NOVX nucleic acids and their encoded polypeptides are useful in a variety of 
applications and contexts. The various NOVX nucleic acids and polypeptides according to the 
invention are useful as novel members of the protein families according to the presence of 
5 domains and sequence relatedness to previously described proteins. Additionally, NOVX 
nucleic acids and polypeptides can also be used to identify proteins that are members of the 
family to which the NOVX polypeptides belong. 

Table A indicates homology of NOVX nucleic acids to known protein families. Thus, 
the nucleic acids and polypeptides, antibodies and related compounds according to the 
10 invention corresponding to a NOVX as identified in column 1 of Table A, will be useful in ( 
therapeutic and diagnostic applications implicated in, for example, pathologies and disorders 
associated the the known protein families identified in column 5 of Table A. 

The NOVX nucleic acids and polypeptides can also be used to screen for molecules, 
which inhibit or enhance NOVX activity or function. Specifically, the nucleic acids and 
1 5 polypeptides according to the invention may be used as targets for the identification of small 
molecules that modulate or inhibit, e.g., neurogenesis, cell differentiation, cell proliferation, 
hematopoiesis, wound healing and angiogenesis. 

Additional utilities for the NOVX nucleic acids and polypeptides according to the 
invention are disclosed herein. 

20 NOV1 

A disclosed NOV1 nucleic acid of 12660 nucleotides (also referred to as CG57602-01) 
encoding a DJ0751H13.1 PROTEIN-like protein is shown in Table 1 A. An open reading 
frame was identified beginning with an ATG initiation codon at nucleotides 1-3 and ending 
with a TAA codon at nucleotides 12658-12660. The start and stop codons are in bold letters. 

25 

Table 1A. NOV1 nucleotide sequence (SEQ ID NO:l). 

ATGCTACTCCCTGCCCTCCTCTTTGGGATGGCGTGGGCCCTGGCTGACGGGCGGTGGTGT 
GAGTGGACAGAGACCATCCGTGTGGAGGAGGAAGTGGCACCCCGTCAGGAGGACCTGGTA 
CCCTGTGCCAGCCTCGACCATTACAGCCGCCTGGGCTGGCGGCTGGACCTGCCCTGGAGT 
GGCCGCTCGGGGCTTACCCGGTCCCCAGCGCCTGGGCTCTGTCCTATCTACAAACCTCCA 
GAAACCCGGCCTGCCAAGTGGAACCGGACAGTGAGGACTTGTTGCCCAGGCTGGGGGGGC 
GCCCACTGCACTGAGGCCCTTGCCAAAGCCAGTCCTGAAGGCCACTGCTTTGCCATGTGG 
CAGTGCCAGCTACAGGCAGGCTCAGCTAATGCCTCAGCAGGAAGCCTGGAGGAGTGCTGC 
GCCCGGCCCTGGGGACGAAGCTGGTGGGATGGCAGCTCCCAGGCCTGCCGCAGCTGCTCC 
AGCCGACACCTGCCAGGCAGTGCCTCTTCTCCAGCCCTCCTGCAGCCCCTGGCAGGGGCT 
GTGGGCCAGCTCTGGAGCCAGCACCAGCGTCCCTCGGCCACCTGTGCCTCCTGGTCGGGC 
TTCCACTACCGCACCTTTGATGGCCGCCACTATCACTTCCTGGGCCGCTGCACCTACCTG 
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CTGGCGGGTGCTGCGGACTCCACCTGGGCTGTCCACCTAACACCCGGGGACCGCTGCCCC 
CAGCCTGGACACTGTCAGCGGGTCCAGGTGACTATGGGACCCGAGGAGGTGCTGATCCAG 
GCTGGAAATGTGTCTGTGAAGGGGCAGCTGGTACCTGAAGGGCAGTCTTGGCTGCTCCAC 
GGGCTGAGCCTGCAATGGCTGGGGGACTGGCTGGTGCTGTCAGGAGGCCTGGGGGTCGTG 
GTGCGGCTGGACAGGACTGGCTCCATCTCCATCTCTGTGGACCACGAGCTCTGGGGACAG 
ACACAAGGCCTCTGTGGGCTCTACAATGGCTGGCCAGAGGATGACTTCATGGAGCCAGGC 
GGAGGGCTGGCCATGTTAGCAGCCACCTTTGGAAATTCCTGGAGGCTCCCTGGCTCGGAG 
GTTTCCCCCGCTGAGTACCACGAGGCCTGTCTCTTTGCCTACTGCGCAGGGGCCATGGCA 
GGCAGTGGGCAAGAGGGGCGGCAGCAGGCTGTTTGTGCCACCTTTGCCAGCTATGTCCAG 
GCCTGTGCCAGGCGGCACATCCACATTCGCTGGAGGAAGCCTGGCTTCTGCGAGCGCCTG 
TGCCCCGGGGGCCAGCTCTACTCCGACTGCGTCTCCCTCTGCCCACCCAGCTGCGAGGCG 
GTGGGTCAGGGAGAGGAGGAGTCCTGCAGGGAAGAGTGTGTGAGTGGCTGTGAGTGCCCG 
CGAGGCCTCTTCTGGAATGGCACCCTCTGTGTGCCTGCTGCCCACTGCCCCTGCTACTAC 
TGCCGCCAGCGCTATGTACCCGGTGACACCGTGCGCCAGCTGTGTAACCCCTGCGTGTGC 
AGGGATGGCCGCTGGCACTGTGCCCAGGCACTGTGCCCCGCCGAGTGTGCAGTGGGTGGG 
GACGGGCACTACCTCACCTTCGATGGGCGGAGCTACTCCTTCTGGGGTGGTCAAGGTTGC 
CGCTACAGCCTGGTGCAGCCCTATCCTTCCACGTGCCCCACCCCCACCATCAGACCCCCT 
GTTCCAGGAGCTGTGCTGGTCAATGGGCAGGATGTGGGCTTGCCCTGGATTGGCGCTGAG 
GGCCTCAGTGTGCGCCGAGCTTCCTCTGCCTTTCTGCTGCTGCGCTGGCCTGGGGCCCAG 
GTGCTCTGGGGACTGTCTGACCCTGTAGCCTACATCACCCTGGACCCCCGCCATGCCCAC 
CAGGTGCAGGGTCTGTGTGGCACCTTCACCCAGAACCAGCAGGACGACTTCCTGACACCA 
GCCGGAGATGTGGAAACTAGCATTGCTGCCTTTGCTAGCAAGTTCCAGGTGGCCGGCAAG 
GGAAGATGCCCCTCTGAGGACAGTGCCCTGCTGTCTCCCTGCACCACCCACTCCCAGCGC 
CACGCCTTCGCAGAGGCGGCCTGTGCCATCCTGCACAGCTCTGTCTTCCAGGAATGCCAC 
AGGCTGGTGGACAAAGAGCCATTCTATCTGCGCTGCCTGGCAGCCGTGTGTGGCTGTGAT 
CCCGGCAGTGACTGCCTGTGCCCGGTGCTGTCTGCCTATGCGCGTCGCTGTGCCCAGGAA 
GGTGCCTCACCTCCCTGGAGGAACCAGACCCTCTGCCCTGTTATGTGTCCTGGTGGCCAG 
GAGTACCGAGAGTGTGCCCCAGCATGCGGTCAACACTGCGGGAAACCAGAGGACTGTGGA 
GAGCTGGGCAGCTGTGTGGCTGGTTGTAACTGTCCTCTGGGGCTGCTGTGGGACCCTGAG 
GGCCAGTGTGTGCCCCCCAGCTTGTGCCCCTGCCAGCTCGGAGCCCGTCGCTATGCCCCT 
GGCAGTGCCACCATGAAGGAGTGCAACCGCTGGGAGCTTGTCTATGCCCCTGGTGCCTGT 
CTCCTCACCTGTGACAGCCCCAGCGCCAATCACTCCTGCCCTGCAGGCAGTACTGATGGC 
TGTGTCTGTCCACCAGGCACGGTGCTGCTGGACGAGCGCTGTGTGCCTCCTGACCTCTGT 
CCCTGCCGTCACAGTGGGCAGTGGTACCTGCCCAACGCCACCATCCAGGAAGACTGCAAC 
GTTTGCGTGTGCCGGGGCCGGCAGTGGCACTGCACAGGCCAGCGGCGCAGTGGGCGGTGC 
CAGGCATCAGGCGCCCCCCACTATGTGACATTTGACGGACTGGCCTTCACCTATCCTGGG 
GCCTGCGAGTATCTGCTGGTGCGAGAGGCCAGTGGCCTATTCACAGTCTCTGCCCAGAAC 
CTGCCCTGTGGGGCCAGCGGTCTCACCTGCACCAAAGCGCTGGCCGTGCGTCTGGAGGGC 
ACTGTTGTGCACATGCTCAGAGGGACTCGGGTCCTGGTGCAACTGTCCCCTCAGTTCCGT 
GGTCGCGTGGCTGGGCTGTGTGGTGACTTTGATGGAGATGCCAGTAATGATCTGCGGAGC 
CGCCAGGGCGTCCTGGAGCCCACAGCTGAACTGGCTGCCCACTCCTGGCGCCTCAGCCCC 
CTCTGCCCTGAGCCAGGAGACCTGCCACACCCCTGCACGATGAACACACACCGGGCTGGT 
TGGGCTCGGGCCCGCTGTGGGGCGCTGCTGCAGCCGCTCTTCACATTATGCCACGCGGAG 
GTCCCCCCGCAGCAGCACTATGAGTGGTGCCTGTATGACGCCTGCGGCTGCGACTCGGGG 
GGTGACTGTGAGTGCCTCTGCTCGGCCATTGCCACCTATGCAGATGAGTGTGCCCGGCAT 
GGGCACCACGTGCGCTGGCGTAGCCAGGAGCTCTGCTGTCTACACCAGACACCCTGTGCC 
CTCCATGGTGGTCACCTTGGCCAGCCAGCCTGGTGTGGTTGCATCCTTTTGCCCTTGTGC 
CTCAGTGATCCAAGGCTCTCTCCTCTCCACCCAGCCCTGCAGTGTGAAGGGGGACAGGTA 
TATGAGGCCTGTGGCCCCACGTGTCCCCCCACCTGCCATGAGCAGCATCCTGAGCCCGGG 
TGGCACTGCCAGGTGGTGGCCTGTGTGGAGGGCTGCTTCTGCCCCGAGGGGACTCTGCTG 
CACGGACGCTGTGTGCTCAGCACGTGCCAGGAAGGTCAATGGCATTGTGGGGGTGACGGT 
GGCCACTGTGAGGAGCTTGTGCCTGCCTGTGCAGAGGGAGAGGCCCTGTGCCAAGAGAAT 
GGGCACTGTGTGCCCCATGGGTGGCTTTGTGACAACCAGGACGACTGTGGCGATGGCTCT 
GATGAGGAGGGTGAGTGTCTTTGCCCATGCGTGGAAGCGACAGGGTTGGTCAGTCCTTGC 
ACATGTTGTGCCGCCCCAGGCTGTGGGGAGGGGCAGATGACTTGCAGCTCCGGCCACTGC 
CTGCCCCTGGCCCTGCTCTGTGACCGCCAGGATGACTGTGGAGATGGCACGGATGAGCCG 
AGCTATCCGTGCCCCCAGGGCTTGCTGGCCTGTGCCGATGGACGCTGCCTGCCGCCGGCC 
CTGCTCTGCGATGGGCATCCTGACTGTCTGGATGCCGCCGACGAGGAGTCCTGTCTGGGG 
CAGGTGACCTGCGTCCCCGGGGAGGTGTCCTGTGTTGATGGCACCTGCCTGGGGGCCATC 
CAGCTGTGTGACGGAGTCTGGGACTGCCCAGATGGAGCCGATGAGGGGCCGGGACACTGC 
CCCCTACCTTCTCTGCCCACACCTCCTGCCAGCACCTTGCCTGGCCCCTCCCCAGGCTCC 
CTGGACACTGCGTCAAGTCCCCTGGCCAGCGCCAGCCCTGCGCCACCCTGCGGCCCCTTC 
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GAGTTTCGGTGCGGCAGCGGCGAGTGCACCCCGCGGGGCTGGCGCTGCGACCAGGAGGAA 
GACTGCGCCGACGGCAGCGACGAGCGCGGCTGCGGAGGGCCCTGCGCGCCGCACCACGCG 
CCCTGCGCCCGCGGCCCTCACTGCGTGTCCCCCGAGCAGCTGTGCGACGGCGTGCGGCAG 
TGTCCCGACGGCTCGGACGAGGGCCCCGACGCCTGCGTTGAGGCTCCCGCGCCCCCGGCC 
ATGCGCGGCCCCCCTGGCCAAGCCGGCGGGCCCACCTCTTCCCGAGCGCCATCCCCACCT 
TCGCCTCCTGAGGCACAGGGAGAGGGCAGGAAGGGACAGGAGCGGAGCAGGACACATCTC 
ACAGTGCCCGCAGGCTCCACCCAGCTGCCTCTGTGCCCTGGCCTCTTTCCCTGTGGTGTG 
GCTCCGGGGCTGTGCCTGACCCCTGAGCAGCTCTGTGATGGGATCCCAGACTGTCCCCAG 
GGCGAGGACGAGCTGGACTGCGGGGGGCTGCCAGCCCTGGGAGGCCCCAACAGGACAGGG 
CTTCCCTGCCCAGAATACACCTGCCCCAATGGCACCTGCATAGGCTTCCAGCTGGTGTGT 
GATGGGCAGCCTGACTGTGGAAGGCCAGGGCAGGTGGGCCCCTCCCCAGAAGAGCAGGGT 
TGTGGGGCCTGGGGCCCCTGGAGCCCATGGGGGCCCTGCAGCCGGACGTGTGGGCCCTGG 
GGCCAGGGCCGGAGCCGCCGCTGCTCCCCACTCGGCCTCCTGGTGCTACAGAACTGCCCA 
GGGCCTGAGCACCAGTCTCAGGCCTGCTTCACGGCAGCCTGCCCAGTGGACGGTGAATGG 
AGCACCTGGTCCCCCTGGTCTGTGTGCTCTGAGCCGTGCAGGGGCACCATGACGCGGCAA 
CGGCAGTGCCACTCACCCCAGAATGGGGGCCGCACCTGTGCTGCACTGCCCGGAGGCCTG 
CACAGCACCCGCCAGACCAAGCCTTGCCCTCAGGACGGCTGCCCCAATGCCACTTGCTCT 
GGGGAGCTGATGTTCCAGCCCTGTGCCCCCTGCCCACTGACCTGTGATGACATCTCTGGC 
CAGGTCACGTGCCCACCTGATTGGCCCTGCGGCAGCCCGGGCTGCTGGTGCCCAGAAGGG 
CAGGTGCTGGGCAGCGAGGGGTGGTGTGTGTGGCCCCGGCAGTGCCCCTGCCTGGTGGAC 
GGTGCCCGCTACTGGCCTGGGCAACGCATCAAGGCCGACTGCCAGCTCTGCATCTGCCAA 
GACGGACGGCCCCGACGCTGCCGACTCAACCCGGACTGCGCTGGTGAGGCCCTTCCCTCG 
GGGTCCCTAGTCCTCTCCCTGGACCGCCCAGCTGCACATCCACCACCTCCTTCAGGCTCT 
GACTGTTGGCCCTCCCTCAGTGGACTGTGGCTGGTCCTCCTGGTCACCCTGGGCCAAGTG 
CCTGGGCCCCTGTGGAAGCCAGAGCATCCAGTGGTCCTTCCGGAGCTCCAACAACCCCCG 
CCCCTCCGGCCGAGGTCGCCAGTGCCGTGGCATCCACCGCAAGGCACGCAGACGGAGCCC 
TGTGAGGGGTGTGAGCATCAGGGCCAGGTCCACCGTGTCGGGGAACGCTGGCATGGGGGC 
CCCTGCAGGGTGTGCCAGTGTCTGCACAACCTCACCGCACACTGCTCACCCTACTGCCCG 
CTCGGCAGCTGCCCCCAGGGCTGGGTCTTGGTGGAGGGGACGGGAGAATCATGCTGCCAC 
TGTGCCCTACCTGGAGAGAACCAGACGGTCCAGCCCATGGCCACTCCTGCCGCAGCTCCG 
GCTCCCAGTCCCCAGATCAGATTCCCTTTGGCCACTTACATTCTGCCTCCGTCAGGAGGC 
TCCTGCCGCCCTCTGTCCTCCCCTACTCCAGCCTGTCTCTCTCTTCTGCACCCAGACCCC 
TGCTATTCTCCCCTGGGGCTGGCCGGACTGGCTGAGGGGAGTCTGCATGCATCGTCCCAG 
CAGCTGGAACACCCCACCCAGGCTGCCCTCCTGGGGGCTCCCACCCAGGGGCCCAGCCCT 
CAGGGATGGCACGCTGGAGGGGATGCTTATGCCAAGTGGCACACTCGGCCCCATTACCTG 
CAGCTGGACCTGCTTCAGCCTCGGAACCTCACTGGCATCCTAGTGCCGGAGACTGGCTCC 
TCCAACGCATATGCCAGCAGCTTCTCACTCCAGTTCAGCAGCAATGGTCTACACTGGCAT 
GACTATCGTGACCTCCTGCCTGGCATCTTGCCCCTGCCCAAGGTATCACCCGCCCAAGGC 
CGATGGGGCCAGCAGCCCACCATGCCCTTTTGTGGGTTCCATAGTCTTTGTCCCCAAGGG 
CCTTCCAGTGTCCCCGAGGGGCATGGCCTGCATTCGATGCTTGTTGAATACCTGCTTTTC 
CCCAGAAACTGGGATGACCTGGACCCTGCCGTATGGACTTTCGGCCGCATGGTGCAGGCG 
AGGTTTGTCAGGGTGTGGCCCCACGATGTCCACCACAGCGATGTCCCCCTGCAGGTGGAG 
CTGCTGGGCTGCGAGCCAGGGGTTGGACTCCGCTGTGCCAGTGGTGAGTGTGTCCTGAGA 
GGGGGCCCTTGTGACGGTGTTCTGGACTGCGAGGATGGCTCGGATGAGGAGGGCTGTGTG 
TTGCTGCCTGAGGGCACTGGCAGGTATACTGTGGCCGGCCGTGCAGCTCACGCCCTTGGC 
CTGGCCTTTGAGGGGACAGCCATGTGGGAGGGGCCCGGCACTGCCTTCACCCCCAAGGTG 
CCCAGACCCTGCATGCTGAGGAGCTGCAGCCGGGGCCTGGCAGAGACTGAGCACTGGCCC 
CCTGGGCAGGAATCCCCCACGTCCCCGACAGAAGCCTGGGATACCCTCTCGAGGGCCCCC 
ACCTTCCTTTCCTGGGAAGGGGAGCTGGGCAAGCCTCACCTCCCTCTACCTACACCCACA 
GAGACAAGGCCCGTGAGTCCTGGCCCAGCCTCCGGGGTGCCTCACCATGGGGAATCTGTG 
CAGATGGTGACCACCACCCCCATACCCCAGATGGAGGCCAGGACCCTGCCACCAGGTATG 
GCAGCTGTGACGGTGGTGCCCCCACACCCTGTGACTCCAGCGACCCCTGCTGGTAAGATG 
CCAAGCCCTTCTACCTGTGTGCAAGGCCTCTGCCAGAGCGTCGCCCCAGGACCCTTCCCA 
CCTGTGCAGTGTGGCCCCGGCCAGACGCCCTGTGAGGTGCTGGGCTGCGTGGAACAGGCG 
CAGGTGTGTGATGGCAGGGAGGATTGCCTCGACGGCTCCGACGAGAGGCACTGCGGTGAG 
CTGCTGGAGGGCCTGCTGTCCTGTGGGGCCCTCTGTTCCCCGAGCCAGCTGAGCTGTGGC 
AGCGGGGAGTGTCTGTCTGCTGAGCGGCGCTGTGACCTGCGGCCTGACTGCCAGGATGGC 
TCGGACGAGGATGGCTGTGTGGACTGCGTGCTGGCCCCCTGGTCTGTCTGGAGCAGCTGC 
AGCCGCAGCTGTGGCCTGGGCCTCACCTTCCAGCGCCAGGAGCTGCTGCGGCCTCCTCTG 
CCAGGGGGCAGCTGCCCGCGTGACCGGTTCCGAAGCCAGTCCTGCTTTGTGCAGGCCTGC 
CCAGTGGCTGGGGCATGGGCCATGTGGGAGGCCTGGGGACCCTGCAGCGTCTCCTGCGGG 
GGTGGCCATCAGAGTCGCCAGAGAAGCTGTGTGGACCCCCCACCCAAGAATGGCGGTGCC 
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CCCTGCCCCGGGGCCTCCCAAGAGAGGGCACCCTGCGGCTTGCAGCCCTGCTCAGGTGGC 
ACAGGTAAGGGGGTGCTGGGCTGGGGACACGGAGGGAGCACTGTGGGCACGGGGCGATTG 
GGCCTCCCAGCACCTAGGCTCACCTGGTGCCCATCCCCCACCAGACTGCGAGCTGGGCCG 
TGTGTATGTGAGTGCCGATCTGTGCCAGAAGGGGCTGGTGCCCCCATGCCCACCCTCCTG 
CCTGGATCCCAAGGCCAACAGAAGCTGCAGTGGGCATTGTGTGGAAGCCTCCCCTCCCTG 
CTGTGCCCCTTGGGGCTGAGTGCCCTCTTCCACATACTCCCAGGATGCCGCTGTCCCCCG 
GGGCTCCTTCTGCATGACACTCGCTGCCTGCCCCTCTCTGAGTGCCCCTGCCTGGTGGGC 
GAAGAGCTGAAGTGGCCAGGGGTGTCCTTCCTCCTGGGCAACTGCAGCCAATGCGTGTGT 
GAGAAGGGGGAGTTGCTGTGCCAACCAGGGGGCTGCCCCCTGCCCTGCGGCTGGTCAGCC 
TGGTCCTCCTGGGCTCCCTGCGACCGCTCCTGTGGCTCTGGAGTGAGGGCCAGGTTCAGG 
TCTCCCTCCAACCCTCCGGCAGCCTGGGGGGGTGCCCCGTGTGAAGGTGACCGGCAGGAA 
CTGCAGGGCTGCCACACAGTGTGTGGGACAGGTATAGCCGGGAGCCTGGGTGCAGGAGTC 
CCCCCTTCCTCCTCCCAATTCTGTACCCTGAGAACACATGGGATGGGACCCACTGACCAC 
TCTACGTGGGGGATTGAGGTGTTCGGCTGGACGCCCTGGACTTCCTGGTCCTCCTGCTCC 
CAAAGCTGCCTTGCCCCGGGAGGGGGCCCTGGCTGGCGCAGTCGTTCCCGACTCTGCCCC 
AGCCCTGGGGATTCATCCTGCCCAGGAGATGCCACCCAGGAGGAGCCCTGCAGCCCCCCT 
ATAGAGTGTACGGGCTTCTGCGCCCCCGGCTGCACCTGCCCCCCTGGTCTTTTCCTGCAC 
AATGCTAGCTGCCTGCCCCGCAGCCAGTGCCCCTGCCAGCTGCACGGGCAGCTCTATGCA 
TCAGGAGCAATGGCTCGCCTGGACTCCTGCAACAACTGCACCTGTGTCTCTGGTAAGATG 
GCATGCACCTCGGAGCGCTGCCCAGTGGCCTGTGGCTGGAGTCCCTGGACCCTGTGGAGT 
CTCTGTAGCTGCAGCTGCAACGTGGGCATTCGGCGCCGCTTCCGGGCAGGCACTGCACCC 
CCAGCTGCCTTTGGGGGTGCTGAGTGCCAAGGCCCCACCATGGAGGCTGAATTCTGCAGC 
CTGCGGCCATGTCCAGGGCCAGTGCCTGGCATGTGTCCCAGGGACAAGCAGTGGCTGGAC 
TGTGCCCAGGGCCCTGCCTCTTGTGCAGAGCTCAGCGCCCCAAGAGGGACTAACCAGACC 
TGCCACCCTGGCTGCCACTGCCCCTCTGGGATGCTTCTGCTGGTGAGCCCACGTGGTCAC 
CCTGGACCCCTTGGAGCCAGTGTTCAGCCTCCTGTGGCCCTGCCCGGTGCCATCGGCACC 
GGTTCTGTGCCAGGAGCTGGGGGATGGGGTCCATGGGGGCCCTGGTCCCACTGTAGCCGG 
AGCTGTGGGGGAGGCCTGCGGAGCCGGACCCGGGCCTGTGACCAGCCCCCACCCCAGGGC 
CTGGGGGATTACTGCGAGGGGCCACGGGCACAGGGGGAGGTCTGCCAGGCTCTGCCCTGC 
CCAGTGACCAACTGCACTGCCATTGAAGGGGCCGAGTATAGCCCCTGTGGCCCTCCGTGC 
CCTCGCTCCTGTGATGACCTAGTGCACTGCGTGTGGCGCTGCCAGCCTGGCTGCTACTGC 
CCACCAGGCCAGGTACTGAGTTCCAACGGGGCCATCTGCGTGCAGCCGGGTCACTGCAGC 
TGCCTGGACCTGCTGACCGGGCAGCGGCACCATCCGGGTGCTCGGCTGGCAAGGCCTGAC 
GGCTGCAACCACTGCACCTGCCTGGAGGGGAGGCTGAACTGCACAGACCTGCCCTGCCCA 
GACTGCGGGGGTGGCCAGAGTCTGCATCCCTGTGGGCAGCCCTGCCCCCGCTCCTGCCAG 
GACCTGTCCCCTGGGAGTGTGTGCCAGCCAGGCTCTGTGGGCTGCCAGCCCACTTGTGGG 
TGCCCCCTGGGCCAGCTCTCCCAGGACGGGCTGTGCGTGCCCCCAGCCCACTGCCGCTGC 
CAGTACCAGCCTGGAGCCATGGCCCCCTCCTTTGTCCCCAGCACCTGTGTGGCAGGCATT 
CTGCAATGCCAGGAGGTGCCTGACTGCCCGGACCCTGGGGTGTGGAGCTCTTGGGGCCCT 
TGGGAAGACTGCAGTGTTTCGTGTGGGGGCGGGGAGCAGCTGCGCTCCCGGCGCTGTGCT 
CGTCCTCCCTGCCCAGGGCCTGCCCGCCAGAGCCGCACATGCAGCACACAGGTCTGCAGA 
GAGGCAGGCTGCCCGGCTGGCCGCCTGTACCGTGAATGCCAGCCCGGCGAGGGATGCCCC 
TTCTCCTGCGCCCACGTCACGCAGCAGGTGGGCTGCTTCTCTGAGGGCTGCGAGGAGGGC 
TGCCACTGCCCCGAGGGCACCTTCCAGCACCGCCTGGCCTGTGTGCAGGAGTGCCCTTGT 
GTGCTGACAGCCTGGCTGCTGCAGGAGCTGGGAGCCACCATAGGTGACCCTGGTCAGCCC 
CTCGGGCCTGGAGATGAGCTGGACTCAGGCCAGACACTTCGTACAAGCTGTGGCAACTGC 
TCGTGTGCACACGGGAAGCTGTCTTGCTCCCTGGACGACTGCTTCGAGGCCGATGGTGGT 
TTCGGTCCCTGGAGCCCGTGGGGCCCGTGCTCCCGCTCCTGTGGAGGGCTGGGCACCCGT 
ACCCGCAGCCGCCAGTGTGTGCTCACCATGCCCACCCTCAGTGAGCTGCCCGTGTGCCCT 
GGCCCTGGCTGTGGGGCTGGGAACTGTTCCTGGACCTCCTGGGCCCCGTGGGAACCTTGC 
TCCCGCAGCTGCGGAGTGGGCCAGCAGCGCCGCCTGCGGGCATACCGTCCCCCTGGGCCC 
GGCGGGCACTGGTGCCCCAACATCCTTACTGCCTACCAAGAGCGCCGCTTCTGCAACCTG 
CGAGCCTGCCCAGAGGCCGGCTGCCCAGCAGGCATGGAGGTGGTCACCTGTGCCAACCGC 
TGCCCCCGCCGCTGCTCAGACCTCCAGGAGGGAATTGTGTGTCAGGACGACCAGGTCTGC 
CAGAAGGGCTGCCGCTGCCCAAAGGGGTCCCTGGAGCAGGATGGTGGCTGCGTGCCAATT 
GGGCACTGTGACTGCACCGATGCCCAGGGCCACAGCTGGGCCCCGGGGAGCCAGCACCAG 
GATGCCTGCAACAACTGCTCATGCCAAGCTGGGCAGCTCTCCTGCACGGCTCAGCCCTGC 
CCGCCTCCCACCCACTGTGCCTGGAGCCACTGGTCGGCCTGGAGTCCCTGCAGCCACTCA 
TGCGGGCCCAGAGGGCAGCAGAGCCGCTTCCGCTGCGGCCCGGGCCTGGCCTCTCGCTCT 
GGGTCCTGCCCCTGCCTGATGGCCAAGGCCGACCCCACCTGCAACAGCACCTTCCTCCAC 
CTGGACACCCAGGGCTGCTACTCAGGGCCCTGCCCAGACTCATGCCAGTGGAGTCTGTGG 
GGGCCATGGAGCCCCTGCCAGGTGCCCTGCAGTGGGGGGTTCAGGCTACGCTGGAGAGAG 
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GCAGAGGCCCTCTGTGGAGGAGGCTGCCGGGAGCCATGGGCTCAAGACAGAAAGCTGCAA 
CGGAGGGCCCTGCCCAGCACCTGTGTCAACGAGTCCCTGGTGTGCCCACACCAGGAGTGT 
CCAGTCCTTGGGCCTTGGTCAGCCTGGAGCAGTTGCTCGGCCCCCTGTGGTGGGGGCACT 
ATGGAGCGACATCGGACTTGTGAGGGGGGTCCTGGGGTGGCACCATGCCAGGCCCAGGAC 
ACAGAGCAACGGCAGGAGTGTAACCTGCAGCCCTGCCCTGAGTGCCCCCCTGGCCAGGTG 
CTTAGTGCCTGTGCCACCTCATGCCCGTGCCTCTGCTGGCATCTGCAGCCTGGTGCCATC 
TGTGTGCAGGAGCCCTGCCAGCCTGGCTGTGGCTGCCCTGGAGGGCAGCATTCTCTGCCC 
TGGGGCCTCACCCTGACCCTGGAAGAGCAGGCCCAGGAGCTGCCCCCAGGGACTGTGCTC 
ACCCGGAACTGCACCCGCTGTGTCTGCCACGGTGGAGCCTTCAGCTGCTCCCTCGTTGAC 
TGTCAGGGTGAGATAGTGCCCCCTGGGGAAACGTGGCAGCAGGTGGCCCCGGGGGAGCTG 
GGGCTCTGCGAGCAGACGTGCCTGGAGATGAACGCCACAAAGACCCAGAGTAACTGCAGT 
TCAGCTCGAGCCTCGGGCTGCGTGTGCCAGCCCGGGCACTTCCGCAGCCAGGCAGGCCCC 
TGCGTCCCCGAAGACCACTGCGAGTGCTGGCACCTTGGGCGTCCCCACCTGGTGAGACAC 
CGAACCCCCTCTGCTACCACTCACCCATTCCTGACCCCAAGCCTCCCCATCTGTCTGTAA 



In a search of public sequence databases, the NOV1 nucleic acid sequence, located on 
chromsome 8 has 606 of 779 bases (77%) identical to a gb:GENBANK- 
ID:BTSCOSPON|acc:X93 922.1 mRNA from Bos taurus (B.taurus mRNA for SCO-spondin 
protein). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

In all BLAST alignments herein, the "E-value" or "Expect" value is a numeric 
indication of the probability that the aligned sequences could have achieved their similarity to 
the BLAST query sequence by chance alone, within the database that was searched. For 
example, the probability that the subject ("Sbjct") retrieved from the NOV1 BLAST analysis, 
e.g., B.taurus mRNA for SCO-spondin protein, matched the Query NOV1 sequence purely by 
chance is 2.4e-152. The Expect value (E) is a parameter that describes the number of hits one 
can "expect" to see just by chance when searching a database of a particular size. It decreases 
exponentially with the Score (S) that is assigned to a match between two sequences. 
Essentially, the E value describes the random background noise that exists for matches 
between sequences. 

The Expect value is used as a convenient way to create a significance threshold for 
reporting results. The default value used for blasting is typically set to 0.0001 . In BLAST 2.0, 
the Expect value is also used instead of the P value (probability) to report the significance of 
matches. For example, an E value of one assigned to a hit can be interpreted as meaning that 
in a database of the current size one might expect to see one match with a similar score simply 
by chance. An E value of zero means that one would not expect to see any matches with a 
similar score simply by chance. See, e.g., http://www.ncbi.nlm.nih.gov/Education/ 
BLASTinfo/. Occasionally, a string of X's or N's will result from a BLAST search. This is a 
result of automatic filtering of the query for low-complexity sequence that is performed to 
prevent artifactual hits. The filter substitutes any low-complexity sequence that it finds with 
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the letter "N" in nucleotide sequence (e.g., M NNNNNNNNNNNNN M ) or the letter "X" in 
protein sequences (e.g., "XXXXXXXXX"). Low-complexity regions can result in high scores 
that reflect compositional bias rather than significant position-by-position alignment. 
(Wootton and Federhen, Methods Enzymol 266:554-571, 1996). 

5 The disclosed NOV1 polypeptide (SEQ ID NO:2) encoded by SEQ ID NO: 1 has 421 9 

amino acid residues and is presented in Table IB using the one-letter amino acid code. Signal 
P, Psort and/or Hydropathy results predict that NOV1 has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.5087. The most likely cleavage site for NOV1 is 
between positions 17 and 18. 



Table IB. Encoded NOV1 protein sequence (SEQ ID NO:2). 

MLLPALLFGmWALADGRWCEWTETIRVEEEVAPRQEDLVPCASLDHYSRLGWRLDLPWS 
GRSGLTRSPAPGLCPIYKPPETRPAKWNRTVRTCCPGWGGAHCTEALAKASPEGHCFAMW 
QCQLQAGSANASAGSLEECCARPWGRSWWDGSSQACRSCSSRHLPGSASSPALLQPLAGA 
VGQLWSQHQRPSATCASWSGFHYRTFDGRHYHFLGRCTYLLAGAADSTWAVHLTPGDRCP 
QPGHCQRVQVTMGPEEVLIQAGNVSVKGQLVPEGQSWLLHGLSLQWLGDWLVLSGGLGW 
VRLDRTGS I S I S VDHELWGQTQGLCGLYNGWPEDDFMEPGGGIiAMIjAATFGNSWRLPGSE 
VSPAEYHEACLFAYCAGAMAGSGQEGRQQAVCATFASYVQACARRHIHIRWRKPGFCERL 
CPGGQLYSDCVSLCPPSCEAVGQGEEESCREECVSGCECPRGLFWNGTLCVPAAHCPCYY 
CRQRYVPGDTVRQLCNPCVCRDGRWHCAQALCPAECAVGGDGHYLTFDGRSYSFWGGQGC 
RYSLVQPYPSTCPTPTIRPPVPGAVLVNGQDVGLPWIGAEGLSVRRASSAFLLLRWPGAQ 
VLWGLSDPVAYITLDPRHAHQVQGLCGTFTQNQQDDFLTPAGDVETSIAAFASKFQVAGK 
GRCPSEDSALLSPCTTHSQRHAFAEAACAILHSSVFQECHRLVDKEPFYLRCLAAVCGCD 
PGSDCLCPVLSAYARRCAQEGASPPWRNQTLCPVMCPGGQEYRECAPACGQHCGKPEDCG 
ELGSCVAGCNCPLGLLWDPEGQCVPPSLCPCQLGARRYAPGSATMKECNRWELVYAPGAC 
LLTCDSPSANHSCPAGSTDGCVCPPGTVLLDERCVPPDLCPCRHSGQWYLPNATIQEDCN 
VCVCRGRQWHCTGQRRSGRCQASGAPHYVTFDGLAFTYPGACEYLLVREASGLFTVSAQN 
LPCGASGLTCTKALAVRLEGTVVHMLRGTRVLVQLSPQFRGRVAGLCGDFDGDASNDLRS 
RQGVLEPTAELAAHSWRLSPLCPEPGDLPHPCTMNTHRAGWARARCGALLQPLFTLCHAE 
VPPQQHYEWCLYDACGCDSGGDCECLCSAIATYADECARHGHHVRWRSQELCCLHQTPCA 
LHGGHLGQPAWCGCILLPLCLSDPRLSPLHPALQCEGGQVYEACGPTCPPTCHEQHPEPG 
WHCQWACVEGCFCPEGTLLHGRCVLSTCQEGQWHCGGDGGHCEELVPACAEGEALCQEN 
GHCVPHGWLCDNQDDCGDGSDEEGECLCPCVEATGLVSPCTCCAAPGCGEGQMTCSSGHC 
LPLALLCDRQDDCGDGTDEPSYPCPQGLLACADGRCLPPALLCDGHPDCLDAADEESCLG 
QVTCVPGEVSCVDGTCLGAIQLCDGVWDCPDGADEGPGHCPLPSIiPTPPASTLPGPSPGS 
LDTASSPLASASPAPPCGPFEFRCGSGECTPRGWRCDQEEDCADGSDERGCGGPCAPHHA 
PCARGPHCVSPEQLCDGVRQCPDGSDEGPDACVEAPAPPAMRGPPGQAGGPTSSRAPSPP 
SPPEAQGEGRKGQERSRTHLTVPAGSTQLPLCPGLFPCGVAPGLCLTPEQLCDGIPDCPQ 
GEDELDCGGLPALGGPNRTGLPCPEYTCPNGTCIGFQLVCDGQPDCGRPGQVGPSPEEQG 
CGAWGPWSPWGPCSRTCGPWGQGRSRRCSPLGLLVLQNCPGPEHQSQACFTAACPVDGEW 
STWSPWSVCSEPCRGTMTRQRQCHSPQNGGRTCAALPGGLHSTRQTKPCPQDGCPNATCS 
GELMFQPCAPCPLTCDDISGQVTCPPDWPCGSPGCWCPEGQVLGSEGWCVWPRQCPCLVD 
GARYWPGQRIKADCQLCICQDGRPRRCRLNPDCAGEALPSGSLVLSLDRPAAHPPPPSGS 
DCWPSLSGLWLVLLVTLGQVPGPLWKPEHPWLPELQQPPPLRPRSPVPWHPPQGTQTEP 
CEGCEHQGQVHRVGERWHGGPCRVCQCLHNLTAHCSPYCPLGSCPQGWVLVEGTGESCCH 
CALPGENQTVQPMATPAAAPAPSPQIRFPLATYILPPSGGSCRPLSSPTPACLSLLHPDP 
CYSPLGLiAGLAEGSLHASSQQLEHPTQAALLGAPTQGPSPQGWHAGGDAYAKWHTRPHYL 
QLDLLQPRNLTGILVPETGSSNAYASSFSLQFSSNGLHWHDYRDLLPGILPLPKVSPAQG 
RWGQQPTMPFCGFHSLCPQGPSSVPEGHGLHSMLVEYLLFPRNWDDLDPAVWTFGRMVQA 
RFVRVWPHDVHHSDVPLQVELLGCEPGVGLRCASGECVLRGGPCDGVLDCEDGSDEEGCV 
LLPEGTGRYTVAGRAAHALGLAFEGTAMWEGPGTAFTPKVPRPCMLRSCSRGLAETEHVJP 
PGQESPTSPTEAWDTLSRAPTFLSWEGELGKPHLPLPTPTETRPVSPGPASGVPHHGESV 
QMVTTTPIPQMEARTLPPGMAAVTWPPHPVTPATPAGKMPSPSTCVQGLCQSVAPGPFP 
PVQCGPGQTPCEVLGCVEQAQVCDGREDCLDGSDERHCGELLEGLLSCGALCSPSQLSCG 
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SGECLSAERRCDLRPDCQDGSDEDGCVDCVLAPWSVWSSCSRSCGLGLTFQRQELLRPPL 
PGGSCPRDRFRSQSCFVQACPVAGAWAMWEAWGPCSVSCGGGHQSRQRSCVDPPPKNGGA 
PCPGASQERAPCGLQPCSGGTGKGVLGWGHGGSTVGTGRLGLPAPRLTWCPSPTRLRAGP 
CVCECRSVPEGAGAPMPTLLPGSQGQQKLQWALCGSLPSLLCPLGLSALFHILPGCRCPP 
GLLLHDTRCLPLSECPCLVGEELKWPGVSFLLGNCSQCVCEKGELLCQPGGCPLPCGWSA 
WS S WAPCDRS CGSGVRARFRS PSNPPAAWGGAPCEGDRQELQGCHTVCGTGI AGS LGAGV 
PPSSSQFCTLRTHGMGPTDHSTWGIEVFGWTPWTSWSSCSQSCLAPGGGPGWRSRSRLCP 
S PGDS S CPGDATQEEPCS PPI ECTGFCAPGCTCPPGLFLHNAS CLPRSQCPCQLHGQL YA 
S G AMARLD S CNNCTC VS GKMACT S E RC P VACGWS P WTL WS LC S C S CNVG I RRRFRAGT AP 
PAAFGGAECQGPTMEAEFCSLRPCPGPVPGMCPRDKQWLDCAQGPASCAELSAPRGTNQT 
CHPGCHCPSGMLLLVSPRGHPGPLGASVQPPVALPGAIGTGSVPGAGGWGPWGPWSHCSR 
SCGGGLRSRTRACDQPPPQGLGDYCEGPRAQGEVCQALPCPVTNCTAIEGAEYSPCGPPC 
PRSCDDLVHCVWRCQPGCYCPPGQVLSSNGAICVQPGHCSCLDLLTGQRHHPGARLARPD 
GCNHCTCLEGRLNCTDLPCPDCGGGQSLHPCGQPCPRSCQDLSPGSVCQPGSVGCQPTCG 
CPLGQLSQDGLCVPPAHCRCQYQPGAMAPSFVPSTCVAGILQCQEVPDCPDPGVWSSWGP 
WEDCSVSCGGGEQLRSRRCARPPCPGPARQSRTCSTQVCREAGCPAGRLYRECQPGEGCP 
FSCAHVTQQVGCFSEGCEEGCHCPEGTFQHRLACVQECPCVLTAWLLQELGATIGDPGQP 
LGPGDELDSGQTLRTSCGNCSCAHGKLSCSLDDCFEADGGFGPWSPWGPCSRSCGGLGTR 
TRSRQCVLTMPTLSELPVCPGPGCGAGNCSWTSWAPWEPCSRSCGVGQQRRLRAYRPPGP 
GGHWCPNILTAYQERRFCNLRACPEAGCPAGMEWTCANRCPRRCSDLQEGIVCQDDQVC 
QKGCRCPKGSLEQDGGCVPIGHCDCTDAQGHSWAPGSQHQDACNNCSCQAGQLSCTAQPC 
PPPTHCAWSHWSAWSPCSHSCGPRGQQSRFRCGPGLASRSGSCPCLMAKADPTCNSTFLH 
LDTQGCYSGPCPDSCQWSLWGPWSPCQVPCSGGFRLRWREAEALCGGGCREPWAQDRKLQ 
RRALPSTCVNESLVCPHQECPVLGPWSAWSSCSAPCGGGTMERHRTCEGGPGVAPCQAQD 
TEQRQECNLQPCPECPPGQVLSACATSCPCLCWHLQPGAICVQEPCQPGCGCPGGQHSLP 
WGLTLTLEEQAQELPPGTVLTRNCTRCVCHGGAFSCSLVDCQGEIVPPGETWQQVAPGEL 
GLCEQTCLEMNATKTQSNCSSARASGCVCQPGHFRSQAGPCVPEDHCECWHLGRPHLVRH 
RTPS ATTHPFLTPS LP I CL 



A search of sequence databases reveals that the NOV1 amino acid sequence has 4056 
of 4056 amino acid residues (100%) identical to, and 4056 of 4056 amino acid residues 
(100%) similar to, the 4123 amino acid residue ptnr:SPTREMBL-ACC:075851 protein from 
5 Homo sapiens (Human) (WUGSC:H_DJ0751H13.1 PROTEIN) (E = 0.0). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV1 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 1C. 




Table 1C. BLAST results for NOV1 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|3638957|gb|AAC363 
01.1| (AC004877) 


sco-spondin- 
mucin-like; 
similar to P98167 
(PID:gl711548) ; 
[Homo sapiens] 


4123 


4056/405 
6 (100%) 


4056/4056 
(100%) 


0.0 


7465206|ref|XP 069 
674.1) (XM 069674) 


similar to sco- 
spondi n- mucin - 
like; similar to 
P98167 

(PID:gl711548) ; 
(H. sapiens) 


1098 


879/1060 
(82%) 


890/1060 
(83%) 


0.0 


si|3808095|emb|CAA69 

867.1| (Y08560) 


SCO-spondin [Bos 
taurus] 


870 


600/990 
(60%) 


633/990 
(63%) 


0.0 
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gi|l711548|sp|P98167| 


SCO- spondin, 
bovine 


867 


582/1035 
(56%) 


635/1035 
(61%) 


0.0 


SSPO BOVIN 


qi | 5823 595 | emb | CAB 53 7 


SCO- spondin, 
bovine 


564 


360/494 
(72%) 


395/494 
(79%) 


e-173 


60.il (AJ132107) 



The presence of identifiable domains in NOV1, as well as all other NOVX proteins, 
was determined by searches using software algorithms such as PROSITE, DOMAIN, Blocks, 
Pfam, ProDomain, and Prints, and then determining the Interpro number by crossing the 
domain match (or numbers) using the Interpro website (http:www.ebi.ac.uk/ interpro). 
DOMAIN results for NOV1 as disclosed in Tables 1D-H, were collected from the Conserved 
Domain Database (CDD) with Reverse Position Specific BLAST analyses. This BLAST 
analysis software samples domains found in the Smart and Pfam collections. For Tables 1G- 
lO and all successive DOMAIN sequence alignments, fully conserved single residues are 
indicated by black shading or by the sign (|) and "strong" semi-conserved residues are 
indicated by grey shading or by the sign (+). The "strong" group of conserved amino acid 
residues may be any one of the following groups of amino acids: ST A, NEQK, NHQK, 
NDEQ, QHRK, MILV, MILF, HY, FYW. 

Tables 1D-H list the domain descriptions from DOMAIN analysis results for NOV1. 
This indicates that the NOV1 sequence has properties similar to those of other proteins known 
to contain this domain. 



Table ID. Domain Analysis of NOV1 

gnl|Pfam[pfam00754 / F5_F8_type__C, F5/8 type C domain. This domain is 
also known as the discoidin (DS) domain family. 

CD-Length = 145 residues, 100.0% aligned 

Score = 115 bits (288), Expect = 5e-26 



Table IE . Domain Analysis of NOV1 

gnl |Pfam|pfam00094 / vwd, von Willebrand factor type D domain. 

CD-Length = 158 residues, 99.4% aligned 

Score = 97.1 bits (240), Expect = 2e-20 
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Table IF. Domain Analysis of NOV1 

gnl | Smart | smart 00231 , FA58C, Coagulation factor 5/8 C-terminal 
domain, discoidin domain; Cell surf ace- attached carbohydrate -binding 
domain, present in eukaryotes and assumed to have horizontally 
transferred to eubacterial genomes. 

CD-Length = 135 residues, 99.3% aligned 

Score =63.2 bits (152), Expect = 3e-10 



Table 1G. Domain Analysis of NOV1 

gnl 1 Smart | smar t 002 09, TSP1, Thrombospondin type 1 repeats; Type 1 
repeats in thrombospondin- 1 bind and activate TGF-beta. 

CD-Length = 51 residues, 100.0% aligned 

Score = 53.1 bits (126), Expect = 3e-07 



Table 1H, Domain Analysis of NOV1 

gnl | Pf am | pf am00057 , ldl_recept_a, Low-density lipoprotein receptor 
domain class A 

CD-Length = 39 residues, 94.9% aligned 

Score = 50.4 bits (119), Expect = 2e-06 

The disclosed NOV1 protein contains a thrombospondin type I repeat domain which 
5 are found in the thrombospondin protein and is repeated 3 times. A number of proteins 
involved in the complement pathway (properdin, C6, C7, C8A, C8B, C9) as well as 
extracellular matrix protein like mindin, F-spondin, SCO-spondin and even the 
circumsporozoite surface protein 2 and TRAP proteins of Plasmodium contain one or more 
instance of this repeat. It has been involved in cell-cell interaction, inhibition of angiogenesis, 
10 apoptosis. The intron-exon organisation of the properdin gene confirms the hypothesis that the 
repeat might have evolved by a process involving exon shuffling. A study of properdin 
structure provides some information about the structure of the thrombospondin type I repeat. 

The disclosed NOV1 nucleic acid of the invention encoding a DJ0751H13.1 PROTEIN 
-like protein includes the nucleic acid whose sequence is provided in Table 1 A or a fragment 

1 5 thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 1 A while still encoding a protein that 
maintains its DJ0751H13.1 PROTEIN like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 

20 complementary to any of the nucleic acids just described. The invention additionally includes 

nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
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chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
5 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 27 percent of the bases may be so changed. 

The disclosed NOV1 protein of the invention includes the DJ0751H13.1 PROTEIN - 
like protein whose sequence is provided in Table IB. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 

1 0 in Table 1 B while still encoding a protein that maintains its DJ075 1 H 1 3 . 1 PROTEIN -like 
activities and physiological functions, or a functional fragment thereof In the mutant or 
variant protein, up to about 0 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

15 The above defined information for this invention suggests that this DJ0751H13.1 

PROTEIN -like protein (NOV1) may function as a member of a "DJ0751H13.1 PROTEIN 
family". Therefore, the NOV1 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 

20 but are not limited to: protein therapeutic, small molecule drug target, antibody target 

(therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 
marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 
The NOV1 nucleic acids and proteins of the invention are useful in potential 

25 therapeutic applications implicated in cancer including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the DJ0751H13.1 
PROTEIN -like protein (NOV1) may be useful in gene therapy, and the DJ0751H13.1 
PROTEIN -like protein (NOV1) may be useful when administered to a subject in need 
thereof. By way of nonlimiting example, the compositions of the present invention will have 

30 efficacy for treatment of patients suffering from cardiomyopathy, atherosclerosis, 
hypertension, congenital heart defects, aortic stenosis, atrial septal defect (ASD), 
atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary stenosis, subaortic stenosis, 
ventricular septal defect (VSD), valve diseases, tuberous sclerosis, scleroderma, obesity, 
transplantation, diabetes, autoimmune disease, renal artery stenosis, interstitial nephritis, 
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glomerulonephritis, polycystic kidney disease, systemic lupus erythematosus, renal tubular 
acidosis, IgA nephropathy, hypercalceimia, Lesch-Nyhan syndrome, Von Hippel-Lindau 
(VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
5 ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 
neuroprotection, cancers, and/or other pathologies and disorders. For example, a cDNA 
encoding the transmembrane receptor DJ0751H13.1 PROTEIN -like protein may be useful in 
transmembrane receptor DJ0751H13.1 PROTEIN therapy, and the transmembrane receptor 
DJ0751H13.1 PROTEIN -like protein may be useful when administered to a subject in need 

10 thereof. By way of nonlimiting example, the compositions of the present invention will have 
efficacy for treatment of patients suffering from cardiomyopathy, atherosclerosis, 
hypertension, congenital heart defects, aortic stenosis, atrial septal defect (ASD), 
atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary stenosis, subaortic stenosis, 
ventricular septal defect (VSD), valve diseases, tuberous sclerosis, scleroderma, obesity, 

1 5 transplantation, diabetes, autoimmune disease, renal artery stenosis, interstitial nephritis, 

glomerulonephritis, polycystic kidney disease, systemic lupus erythematosus, renal tubular 
acidosis, IgA nephropathy, hypercalceimia, Lesch-Nyhan syndrome, Von Hippel-Lindau 
(VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 

20 ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 

neuroprotection, cancers, and other diseases, disorders and conditions of the like. Also since 
this gene is expressed at a measurably higher level in several cancer cell lines (including breast 
cancer, CNS cancer, colon cancer, gastric cancer, lung cancer, melanoma, ovarian cancer and 
pancreatic cancer), it may be useful in diagnosis and treatment of these cancers. The NOV1 

25 nucleic acid encoding the DJ0751H13.1 PROTEIN -like protein of the invention, or fragments 
thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV1 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV1 substances for use in therapeutic or 

30 diagnostic methods. These antibodies may be generated according to methods known in the 

art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV1 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
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functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV2 

A disclosed NOV2 nucleic acid of 893 nucleotides (also referred to as CG57558-01) 
encoding a Mac25/IGFBP7-like protein is shown in Table 2A. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
stop codons are in bold letters. 



Table 2A. NOV2 nucleotide sequence (SEQ ID NO: 3). 



GTACCTTAAAGACAA CAAACAA GCAAACACAACTTATAATTAAAAAACA TQCAAAGGGCT 
CACCTTCCACTTCCTTCTGGTCCTGCTCCTCTTCTTCCTCCTCTCCTGCCTCCTCTTCTC 
CCTGTTCCATCAGACCTTCTGGGGCCCCTTTCAATAAGCAGCTGCTGGCCGGCCAGCCCT 
TGGGGGCAGGGCTGGAACCGGGGCAGGGGAGGCTGCGGGGCCACTCGCTGGAGAGGCAAA 
CAGGAAGGACTGCCCCCTGAGCGCCAGGCTTCGGGCCCGGGAATCGCCGCCGCCGCCGCC 
GCAGAGCTGCAGCTCGGGGCCGAGGGTAAGGAGGCGAGCCGGGAGCGGGAGGCCCGGGAG 
AGCTCCGCGGGTCCCCGCGCCCAGTCCCCAGCCGCGCCCCGACCCCGCCGCCCCGGGCCT 
AACGCGGCCGGCGAGGCCTACGCGGCGGCCGCCGTCACCGTGCTGGAGCCGCCGGCCTCC 
GACCCCGAGCTGCAGCCCGCCGAGCGCCCGCTGCCATCGCCGGGGTCCGGGGAGGGCGCC 
CCGGTCTTCCTCACGGGGCCTCGATCCCAGTGGGTGCTGCGGGGGGCGGAGGTGGTGCTG 
ACGTGCCGGGCGGGGGGCCTCCCCGAGCCCACACTGTACTGGGAGAAGGACGGGATGGCC 
CTGGACGAAGTGTGGGACAGCAGCCACTTCGCGCTCCAGCCGGGCCGCGCCGAGGACGGC 
CCCGGCGCGAGCCTGGCACTGCGCATCCTGGCGGCTCGGCTGCCGGATTCCGGCGTCTAC 
GTGTGCCACGCCCGCAACGCGCACGGCCACGCGCAGGCGGGGGCGCTGCTCCAGGTGCTG 
ACCCCCACCTTCCTGCCGCCAAGACAGCCCTAACCAAGGCCCAGAAAGGGTAG 



In a search of public sequence databases, the NOV2 nucleic acid sequence, located on 
chromsome 2 has has 564 of 779 bases (72%) identical to a gb.GENBANK- 
ID:S56581|acc:S56581.1 mRNA from Rattus sp. (alpha inhibin gene {5' region} [rats, 
Genomic, 2141 nt]). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV2 polypeptide (SEQ ID NO:4) encoded by SEQ ID NO:3 has 274 
amino acid residues and is presented in Table B using the one-letter amino acid code. Signal 
P, Psort and/or Hydropathy results predict that NOV2 has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.3700. The most likely cleavage site for a NOV2 
peptide is between amino acids 32 and 33. 



Table2B. Encoded NOV protein sequence (SEQ ID NO:4). 



MQRAHLPLPSGPAPLLPPLLPPLLPVPSDLLGPLSISSCVJPASPWGQGWNRGRGGCGATR 
WRGKQEGLPPERQASGPGIAAAAAAELQLGAEGKEASREREARESSAGPRAQSPAAPRPR 
RPGPNAAGEAYAAAAVTVLEPPASDPELQPAERPLPSPGSGEGAPVFLTGPRSQWVLRGA 
EWLTCRAGGLPEPTLYWEKDGMALDEWDSSHFALQPGRAEDGPGASLALRIIJVARLPD 
SGVYVCHARNAHGHAQAGALLQVLTPTFLPPRQP 
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A search of sequence databases reveals that the NOV2 amino acid sequence has 80 of 
266 amino acid residues (30%) identical to, and 1 12 of 266 amino acid residues (42%) similar 
to, the 277 amino acid residue ptnr:SPTREMBL-ACC:Q07822 protein from Homo sapiens 
5 (Human) (MAC25 PROTEIN). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV2 is expressed in at least brain, ovary, breast, testis. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
1 0 RACE sources. 

The disclosed NOV2 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 2C. 



Table 2C. BLAST results for NOV2 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l473407l|ref |XP 


KIAA0657 protein [Homo 
sapiens] 


1025 


143/150 
(95%) 


143/150 
(95%) 


le-66 


051017. l| 
(XM_051017) 


gi|l3938170|gb|AAH0 


(protein for 

IMAGE:2961284) [Homo 
sapiens] 


1044 


143/150 
(95%) 


143/150 
(95%) 


4e-66 


7201.1|AAH07201 


(BC007201) 


gi|l8552587|ref |XP 


(protein for 

IMAGE.2961284) [Homo 
sapiens] 


180 


50/58 
(86%) 


51/58 
(87%) 


4e-17 


087161. 1| 
(XM_08716l) 


gi|9623317|qb|AAF90 


stretchin-MLCK 
[Drosophila 
melanogaster] 


281 


35/105 
(33%) 


50/105 
(47%) 


7e-8 


112 . 1 |AF254363 1 


(AF254363) 


gi | 17559576 |ref |NP 


titin 

[Caenorhabditis 
elegans] 


2783 


39/120 
(32%) 


57/120 
(47%) 


le-7 


504582. l| 
(NM 072181) 



1 5 Table 2D lists the domain descriptions from DOMAIN analysis results against NOV 2. 

This indicates that the NOV sequence has properties similar to those of other proteins known 
to contain this domain. 
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Table 2D. Domain Analysis of NOV2 

gnl 1 Smart 1 smart 004 09 , IG, Immunoglobulin 

CD-Length = 86 residues, 100.0% aligned 

Score = 57.4 bits (137), Expect = le-09 



Mac25 is a follistatin (FS)-like protein that has a growth-suppressing effect on a p53- 
deficient osteosarcoma cell line (Saos-2). The protein exhibits a strong homology to FS, an 
activin-binding protein, and part of its sequence includes the consensus sequence of the 
5 member of the Kazal serine protease inhibitor family. The mac25 protein was localized in the 
cytoplasm and secreted into culture medium (1). Addition of recombinant mac25 protein (10-7 
M) into the culture medium induced significant suppression of the growth of human cervical 
carcinoma cells (HeLa) and murine embryonic carcinoma cells (PI 9), as well as osteosarcoma 
cells (Saos-2). The mac25 protein was co-immunoprecipitated with activin A, a result that 

10 suggests that mac25 may be a secreted tumor-suppressor that binds activin A. The mac25 

exhibits homology to insulin-like growth factor-binding proteins (IGF-BPs) and to fibroblast 
growth factor receptor. The multi-functional nature of mac25 protein may be important for 
growth-suppression and/or cellular senescence. 

The disclosed NOV2 nucleic acid of the invention encoding a Mac25/IGFBP7-like 

1 5 protein includes the nucleic acid whose sequence is provided in Table 2 A or a fragment 

thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 2A while still encoding a protein that 
maintains its Mac25/IGFBP7-like activities and physiological functions, or a fragment of such 
a nucleic acid. The invention further includes nucleic acids whose sequences are 

20 complementary to those just described, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

25 derivatized. These modifications are carried out at least in part to enhance the chemical 

stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 2B percent of the bases may be so changed. 

The disclosed NOV2 protein of the invention includes the Mac25/IGFBP7-like protein 

30 whose sequence is provided in Table 2B. The invention also includes a mutant or variant 

protein any of whose residues may be changed from the corresponding residue shown in Table 
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2B while still encoding a protein that maintains its Mac25/IGFBP7-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 70 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Mac25/IGFBP7- 
like protein (NOV2) may function as a member of a "Mac25/IGFBP7 family". Therefore, the 
NOV2 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 

10 below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

1 5 The NO V2 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Mac25/IGFBP7-like 
protein (NOV2) may be useful in gene therapy, and the Mac25/IGFBP7-like protein (NOV2) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 

20 example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous 
sclerosis, hypercalcemia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, 
Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 
disorders, addiction, anxiety, pain, neurodegeneration, fertility, hypogonadism, endometriosis, 

25 hemophilia, hypercoagulation, idiopathic thrombocytopenic purpura, immunodeficiencies, 

graft versus host disease, or other pathologies or conditions. The NOV2 nucleic acid encoding 
the Mac25/IGFBP7-like protein of the invention, or fragments thereof, may further be useful 
in diagnostic applications, wherein the presence or amount of the nucleic acid or the protein 
are to be assessed. 

30 NOV2 nucleic acids and polypeptides are further useful in the generation of antibodies 

that bind immuno-specifically to the novel NOV2 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV2 proteins have multiple hydrophilic regions, each of 
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which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 



NOV3 

A disclosed NOV3 nucleic acid of 1703 nucleotides (also referred to as CG57560-01) 
encoding a Calmodulin Binding Protein Kinase-like protein is shown in Table 3 A. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
underlined, and the start and stop codons are in bold letters. 



Table 3A. NOV3 nucleotide sequence (SEQ ID NO:5). 

CCAGGTTGGGGTCTC CCAAAAGCAGCCCCT ATGTTGTGGGGAAGTGGAGAGTAGTACAGC 
TAAGCCAGACCCCATTGTGCCCGCAGGTTAGAGCCTGGCAA TGCCGTTTGGGTGTGTGAC 
TCTGGGCGACAAGAAGAACTATAACCAGCCATCGGAGGTGACTGACAGATATGATTTGGG 
ACAGGTCATCAAGACGGAGGAGTTTTGTGAAATCTTCCGGGCCAAGGACAAGACGACAGG 
CAAGCTGCACACCTGCAAGAAGTTCCAGAAGCGGGACGGCCGCAAGGTGCGGAAAGCTGC 
CAAGAACGAGATAGGCATCCTCAAGATGGTGAAGCATCCCAACATCCTACAGCTGGTGGA 
TGTGTTTGTGACCCGCAAGGAGTACTTTATCTTCCTGGAGCTGGCCACGGGGAGGGAGGT 
GTTTGACTGGATCCTGGACCAGGGCTACTACTCGGAGCGAGACACAAGCAACGTGGTACG 
GCAAGTCCTGGAGGCCGTGGCCTATTTGCACTCACTCAAGATCGTGCACAGGAATCTCAA 
GCTGGAGAACCTGGTTTACTACAACCGGCTGAAGAACTCGAAGATTGTCATCAGTGACTT 
CCATCTGGCTAAGCTAGAAAATGGCCTCATCAAGGAGCCCTGTGGGACCCCCGAGTATCT 
GGCCCCAGAGGTGGTAGGCCGGCAGCGGTATGGACGCCCTGTGGACTGCTGGGCCATTGG 
AGTCATCATGTACATCCTGCTTTCAGGCAACCCACCTTTCTATGAGGAGGTGGAAGAAGA 
TGATTATGAGAACCATGATAAGAATCTCTTCCGCAAGATCCTGGCTGGTGACTATGAGTT 
TGACTCTCCATATTGGGATGATATTTCGCAGGCAGCCAAAGACCTGGTCACAAGGCTGAT 
GGAGGTGGAGCAAGACCAGCGGATCACTGCAGAAGAGGCCATCTCCCATGAGTGGATTTC 
TGGCAATGCTGCTTCTGATAAGAACATCAAGGATGGTGTCTGTGCCCAGATTGAAAAGAA 
CTTTGCCAGGGCCAAGTGGAAGAAGGCTGTCCGAGTGACCACCCTCATGAAACGGCTCCG 
GGCACCAGAGCAGTCCAGCACGGCTGCAGCCCAGTCGGCCTCAGCCACAGACACTGCCAC 
CCCCGGGGCTGCAGGTGGGGCCACAGCTGCAGCTGCGAGTGGAGCTACCTCAGCCCCTGA 
GGGTGATGCTGCTCGTGCTGCAAAGAGTGATAATGTGGCCCCCGCAGACCGTAGTGCCAC 
CCCAGCCACAGATGGAAGTGCCACCCCAGCCACTGATGGCAGTGTCACCCCAGCCACCGA 
TGGAAGCATCACTCCAGCCACTGATGGGAGTGTCACCCCAGCCACTGACAGGAGCGCTAC 
TCCAGCCACTGATGGGAGAGCCACACCAGCCACAGAAGAGAGCACTGTGCCCACCACCCA 
7^AGCAGTGCCATGCTGGCCACCAAGGCAGCTGCCACCCCTGAGCCGGCTATGGCCCAGCC 
GGACAGCACAGCCCCAGAGGGCGCCACAGGCCAGGCTCCACCCTCTAGTAAAGGGGAAGA 
GGCTGCTGGTTATGCCCAGGAGTCTCAAAGGGAGGAGGCCAGCTGA GTAGGCAG CCTGGT 
GAGGGGGGGCAGGGGATGGGCAGGAGGGTGGGAGAGTGGATGAGGGGCTTCTCACTGTAC 
ATAGAGTCACTGGCATGATGCCC 



In a search of public sequence databases, the NOV3 nucleic acid sequence, located on 
chromsome 3 has 1015 of 1 158 bases (87%) identical to a gb:GENBANK- 
ID:RATCBVA|acc:L22557.1 mRNA from Rattus norvegicus (Rattus norvegicus vesicla- 
associate calmodulin-binding protein mRNA, complete cds). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 
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The disclosed NOV3 polypeptide (SEQ ID NO:6) encoded by SEQ ID N0:3 has 501 
amino acid residues and is presented in Table 3B using the one-letter amino acid code. Signal 
P ? Psort and/or Hydropathy results predict that NO V3 has no signal peptide and is likely to be 
localized in the in the cytoplasm with a certainty of 0.4500. 



Table 3B. Encoded NOV3 protein sequence (SEQ ID NO:6). 



MP FGC VTLGD KKNYNQ P S E VTDR YDLGQV I KTE E FC E I FRAKDKTTGKLHTC KKFQ KRDG 
RKVRKAAKNE I GI LKMVKHPNI LQLVDVFVTRKE YF I FLELATGREVFDW I LDQGYYSER 
DTSNWRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 
CGTPEYLAPEWGRQRYGRPVDCWAIGVIMYILLSGNPPFYEEVEEDDYENHDKNLFRKI 
LAGD YE FD S P YWDD I SQ AAKD L VTRLME VEQDQR I T AE E A I S HEW I S GNAAS DKN I KDGV 
CAQ I EKNFARAKWKKAVRVTTLMKRLRAPEQS S TAAAQS AS ATDTATPGAAGGATAAAAS 
GATSAPEGDAARAAKSDNVAPADRSATPATDGSATPATDGSVTPATDGSITPATDGSVTP 
ATDRSATPATDGRATPATEESTVPTTQSSAMLATKAAATPEPAMAQPDSTAPEGATGQAP 
P S S KGEE AAGY AQE S QRE E AS 



A search of sequence databases reveals that the NOV3 amino acid sequence has 1015 
of 1158 amino acid residues (87%) identical to, and 1015 of 1 158 amino acid residues (87%) 
similar to, the 3655 amino acid residue gb:GENBANK-ID:RATCBVA|acc:L22557.1 protein 
from Rattus norvegicus (Rattus norvegicus vesicla-associate calmodulin-binding protein 
mRNA, complete cds). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV3 is expressed in at least Bone Marrow, Brain, Hypothalamus, Thalamus. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV3 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 3C. 



Table 3C. BLAST results for NOV3 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 16 924331 | gb | AAH1 
7363 .1 (AAH17363 
(BC017363) 


protein MGC8407 
[Homo sapiens] 


501 


501/501 
(100%) 


501/501 
(100%) 


0.0 


gi|l3129008|ref |NP 
076951. l| 
(NM_024046) 


protein MGC8407 
[Homo sapiens] 


501 


500/501 
(99%) 


500/501 
(99%) 


0.0 


gi|l7l60946|gb|AAHl 

7634.l|AAH17634 

(BC017634 


Similar to vesicle- 
associated calmodulin- 
binding protein [Mus 
musculus] 


512 


468/512 
(91%) 


476/512 
(92%) 


0.0 
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gi| 13027458 |ref | NP 


vesicle- 
associated 
calmodulin- 
binding protein 
[Rattus 
norvegicus] 


504 


459/504 
(91%) 


465/504 
(92%) 


0.0 


076490. l| 
(NM_024000) 


gi| 13649042|ref |XP 


protein MGC8407 
[Homo sapiens] 


478 


469/471 
(99%) 


470/471 
(99%) 


0.0 


017796. l| 
(XM 017796) 



Table 3D lists the domain descriptions from DOMAIN analysis results against NOV3. 
This indicates that the NOV3 sequence has properties similar to those of other proteins known 
to contain this domain. 



Table 3E. Domain Analysis of NOV 

gnl 1 Smart | smart 0022 0 , S_TKc, Serine/Threonine protein kinases, 
catalytic domain; Phosphotransferases. Serine or threonine-specif ic 
kinase subfamily. 

CD-Length = 256 residues, 100.0% aligned 

Score ~ 271 bits (693), Expect = 7e-74 

Protein phosphorylation is a fundamental process for the regulation of cellular 
functions. The coordinated action of both protein kinases and phosphatases controls the levels 
of phosphorylation and, hence, the activity of specific target proteins. One of the predominant 
roles of protein phosphorylation is in signal transduction, where extracellular signals are 
amplified and propagated by a cascade of protein phosphorylation and dephosphorylation 
events. Eukaryotic protein kinases are enzymes that belong to a very extensive family of 
proteins which share a conserved catalytic core common with both serine/threonine and 
tyrosine protein kinases. There are a number of conserved regions in the catalytic domain of 
protein kinases. In the N-terminal extremity of the catalytic domain there is a glycine-rich 
stretch of residues in the vicinity of a lysine residue, which has been shown to be involved in 
ATP binding. In the central part of the catalytic domain there is a conserved aspartic acid 
residue which is important for the catalytic activity of the enzyme. Protein kinases are 
excellent small molecule drug targets for therapeutic intervention. 

The disclosed NOV3 nucleic acid of the invention encoding a Calmodulin Binding 

Protein Kinase-like protein includes the nucleic acid whose sequence is provided in Table 3A 

or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 

whose bases may be changed from the corresponding base shown in Table 3 A while still 

encoding a protein that maintains its Calmodulin Binding Protein Kinase-like activities and 

physiological functions, or a fragment of such a nucleic acid. The invention further includes 
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nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
5 of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 1 3 percent of the bases may be so 
10 changed. 

The disclosed NOV3 protein of the invention includes the Calmodulin Binding Protein 
Kinase-like protein whose sequence is provided in Table 3B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table B while still encoding a protein that maintains its Calmodulin Binding 
1 5 Protein Kinase-like activities and physiological functions, or a functional fragment thereof. In 
the mutant or variant protein, up to about 1 3 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Calmodulin 
20 Binding Protein Kinase-like protein (NOV3) may function as a member of a "Calmodulin 
Binding Protein Kinase family". Therefore, the NOV3 nucleic acids and proteins identified 
here may be useful in potential therapeutic applications implicated in (but not limited to) 
various pathologies and disorders as indicated below. The potential therapeutic applications 
for this invention include, but are not limited to: protein therapeutic, small molecule drug 
25 target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic 
and/or prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV3 nucleic acids and proteins of the invention are useful in potential 
30 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Calmodulin Binding 
Protein Kinase-like protein (NOV3) may be useful in gene therapy, and the Calmodulin 
Binding Protein Kinase-like protein (NOV3) may be useful when administered to a subject in 
need thereof. By way of nonlimiting example, the compositions of the present invention will 
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have efficacy for treatment of patients suffering from hemophilia, hypercoagulation, idiopathic 
thrombocytopenic purpura, autoimmune disease, allergies, immunodeficiencies, 
transplantation, graft versus host disease, Von Hippel-Lindau (VHL) syndrome, Alzheimer's 
disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
5 cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, 
leukodystrophies, behavioral disorders, addiction, anxiety, pain, neurodegeneration, or other 
pathologies or conditions. The NOV3 nucleic acid encoding the Calmodulin Binding Protein 
Kinase-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
10 assessed. 

NOV3 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV3 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
1 5 section below. The disclosed NOV3 proteins have multiple hydrophilic regions, each of 

which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV4 

20 NOV4 includes two TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like 

proteins disclosed below. The disclosed sequences have been named NOV4a and NOV4b. 

NOV4a 

A disclosed NOV4a nucleic acid of 4877 nucleotides (also referred to as CG57547-01) 
encoding a TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like protein is shown in 
25 Table 4A. Putative untranslated regions upstream and/or downstream from the coding region, 
if any, are underlined, and the start and stop codons are in bold letters. 



Table 4A. NOV4 nucleotide sequence (SEQ ID NO:7). 



AGTCCCCAGCCCCGTCGCCGGCGGAGGCGGGCQCGQGCGC GTTCCTGTGGCCAGTCACCC 
GGAGGAGTTGGTCGCACAATTATGAAAGACTCGGCTTCTGCTGCTAGC GCCGGAGCTGAG 
TTAGTCCTGAGAAGGTTTCCCTGGGCGTTCCTTGTCCGGCCTCTGCTGCCGCCTCCGGAG 
ACGCTTCCCGATAGATGGCTACAGGCCGCGGAGGAGG AGGAGGTGGAGTTGCTGCCCTTC 
CGGAGTCCGCCCCGTGAGGAGAA TGTCCCAGAAATCCTGGATAGAAAGCACTTTGACCAA 
GAGGGAATGTGTATATATTATACCAAGTTCCAAGGACCCTCACAGGTGCCTTCCAGGATG 
TCAAATTTGTCAGCAACTCGTCAGGTGTTTTTGTGGTCGCTTGGTCAAGCAACATGCTTG 
TTTTACTGCAAGTCTTGCCATGAAATACTCAGATGTGAAATTGGGTGACCATTTTAATCA 
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GGCAATAGAAGAATGGTCTGTGGAAAAGCATACAGAACAGAGCCCAACGGATGCTTATGG 
AGTCATAAATTTTCAAGGGGGTTCTCATTCCTACAGAGCTAAGTATGTGAGGCTATCATA 
TGACACCAAACCTGAAGTCATTCTGCAACTTCTGCTTAAAGAATGGCAAATGGAGTTACC 
CAAACTTGTTATCTCTGTACATGGGGGCATGCAGAAATTTGAGCTTCACCCACGAATCAA 
GCAGTTGCTTGGAAAAGGTCTTATTAAAGCTGCAGTTACAACTGGAGCCTGGATTTTAAC 
TGGAGGAGTAAACACAGGTGTGGCAAAACATGTTGGAGATGCCCTCAAAGAACATGCTTC 
CAGATCATCTCGAAAGATTTGCACTATCGGAATAGCTCCATGGGGAGTGATTGAAAACAG 
AAATGATCTTGTTGGGAGAGATGTAGTTGCTCCTTATCAAACCTTATTGAACCCCCTGAG 
CAAATTGAATGTTTTGAATAATCTGCATTCCCATTTCATATTGGTGGATGATGGCACTGT 
TGGAAAGTATGGGGCGGAAGTCAGACTGAGAAGAGAACTTGAAAAAACTATTAATCAGCA 
AAGAATTCATGCTATTGGCCAGGGTGTCCCTGTGGTGGCACTTATATTTGAGGGTGGGCC 
AAATGTT AT C CTC AC AGTTCTTGAAT AC CT T C AGGAAAG C C C C C CTGTT C C AGT AGTTGT 
GTGTGAAGGAACAGGCAGAGCTGCAGATCTGCTAGCGTATATTCATAAACAAACAGAAGA 
AGGAGGGAATCTTCCTGATGCAGCAGAGCCCGATATTATTTCCACTATCAAAAAAACATT 
TAACTTTGGCCAGAATGAAGCACTTCATTTATTTCAAACACTGATGGAGTGCATGAAAAG 
AAAGGAGCTTATCACTGTTTTCCATATTGGGTCAGATGAACATCAAGATATAGATGTAGC 
AATACTTACTGCACTGCTAAAAGGTACTAATGCATCTGCATTTGACCAGCTTATCCTTAC 
ATTGGCATGGGATAGAGTTGACATTGCCAAAAATCATGTATTTGTTTATGGACAGCAGTG 
GCTGGTAGGATCCTTGGAACAAGCTATGCTTGATGCTCTTGTAATGGATAGAGTTGCATT 
TGTAAAACTTCTTATTGAAAATGGAGTAAGCATGCATAAATTCCTTACCATTCCGAGACT 
GGAAGAACTTTACAACACTAAACAAGGTCCAACTAATCCAATGCTGTTTCATCTTGTTCG 
AGACGTCAAACAGGGAAATCTTCCTCCAGGATATAAGATCACTCTGATTGATATAGGACT 
TGTTATTGAATATCTCATGGGAGGAACCTACAGATGCACCTATACTAGGAAACGTTTTCG 
ATTAATATATAATAGTCTTGGTGGAAATAATCGGAGGTCTGGCCGAAATACCTCCAGCAG 
CACTCCTCAGTTGCGAAAGAGTCATGAATCTTTTGGCAATAGGGCAGATAAAAAGGAAAA 
AATGAGGCATAACCATTTCATTAAGACAGCACAGCCCTACCGACCAAAGGTAGATACAGT 
TATGGAAGAAGGAAAGAAGAAAAGAACCAAAGATGAAATTGTAGACATTGATGATCCAGA 
AACCAAGCGCTTTCCTTATCCACTTAATGAACTTTTAATTTGGGCTTGCCTTATGAAGAG 
GCAGGTCATGGCCCGTTTTTTATGGCAACATGGTGAAGAATCAATGGCTAAAGCATTAGT 
TGCCTGTAAGATCTATCGTTCAATGGCATATGAAGCAAAGCAGAGTGACCTGGTAGATGA 
TACTTCAGAAGAACTAAAACAGTATTCCAGTGATTTTGGTCAGTTGGCCGTTGAATTATT 
AGAACAGTCCTTCAGACAAGATGAAACCATGGCTATGAAATTGCTCACTTATGAACTGAA 
GAACTGGAGTAATTCAACCTGCCTTAAGTTAGCAGTTTCTTCAAGACTTAGACCTTTTGT 
AGCTCACACCTGTACACAAATGTTGTTATCTGATATGTGGATGGGAAGGCTGAATATGAG 
GAAAAATTCCTGGTACAAGGTAATACTAAGCATTTTAGTTCCACCTGCCATATTGCTGTT 
AGAGTATAAAACTAAGGCTGAAATGTCCCATATCCCACAATCTCAAGATGCTCATCAGAT 
GACAATGGATGACAGCGAAAACAACTTTCAGAACATAACAGAAGAGATCCCCATGGAAGT 
GTTTAAAGAAGTACGGATTTTGGATAGTAATGAAGGAAAGAATGAGATGGAGATACAAAT 
GAAATCAAAAAAGCTTCCAATTACGCGAAAGTTTTATGCCTTTTATCATGCACCAATTGT 
AAAATTCTGGTTTAACACGTTGGCATATTTAGGATTTCTGATGCTTTATACATTTGTGGT 
TCTTGTACAAATGGAACAGTTACCTTCAGTTCAAGAATGGATTGTTATTGCTTATATTTT 
TACTTATGCCATTGAGAAAGTCCGTGAGGTATTTATGTCTGAAGCTGGGAAAGTAAACCA 
GAAGATTAAAGTATGGTTTAGTGATTACTTCAACATCAGTGATACAATTGCCATAATTTC 
TTTCTTCATTGGATTTGGACTAAGATTTGGAGCAAAATGGAACTTTGCAAATGCATATGA 
TAATCATGTTTTTGTGGCTGGAAGATTAATTTACTGTCTTAACATAATATTTTGGTATGT 
GCGTTTGCTAGATTTTCTAGCTGTAAATCAACAGGCAGGACCTTATGTAATGATGATTGG 
AAAAATGGTGGCCAATATGTTCTACATTGTAGTGATTATGGCTCTTGTATTACTTAGTTT 
TGGTGTTCCCAGAAAGGCAATACTTTATCCTCATGAAGCACCATCTTGGACTCTTGCTAA 
AGATATAGTTTTTCACCCATACTGGATGATTTTTGGTGAAGTTTATGCATACGAAATTGA 
TGTGTGTGCAAATGATTCTGTTATCCCTCAAATCTGTGGTCCTGGGACGTGGTTGACTCC 
ATTTCTTCAAGCAGTCTACCTCTTTGTACAGTATATCATTATGGTTAATCTTCTTATTGC 
ATTTTTCAGCAATGTGTATTTACAAGTGAAGGCAATTTCCAATATTGTATGGAAGTACCA 
GCGTTATCATTTTATTATGGCTTATCATGAGAAACCAGTTCTGCCTCCTCCACTTATCAT 
TCTTAGCCATATAGTTTCTCTGTTTTGCTGCATATGTAAGAGAAGAAAGAAAGATAAGAC 
TTCCGATGGACCAGAACTTTTCTTAACAGAAGAAGATCAAAAGAAACTTCATGATTTTGA 
AGAGCAGTGTGTTGAAATGTATTTCAATGAAAAAGATGACAAATTTCATTCTGGGAGTGA 
AGAGAGAATTCGTGTCACTTTTGAAAGAGTGGAACAGATGTGCATTCAGATTAAAGAAGT 
TGGAGATCGTGTCAACTACATAAAAAGATCATTACAATCATTAGATTCTCAAATTGGCCA 
TTTGCAAGATCTTTCAGCCCTGACGGTAGATACATTAAAAACACTCACTGCCCAGAAAGC 
GTCGGAAGCTAGCAAAGTTCATAATGAAATCACACGAGAACTGAGCATTTCCAAACACTT 
GGCTCAAAACCTTATTGATGATGGTCCTGTAAGACCTTCTGTATGGAAAAAGCATGGTGT 
TGTAAATACACTTAGCTCCTCTCTTCCTCAAGGTGATCTTGAAAGTAATAATCCTTTTCA 
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TTGTAATATTTTAATGAAAGATGACAAAGATCCCCAGTGTAATATATTTGGTCAAGACTT 
ACCTGCAGTACCCCAGAGAAAAGAATTTAATTTTCCAGAGGCTGGTTCCTCTTCTGGTGC 
CTTATTCCCAAGTGCTGTTTCCCCTCCAGAACTGCGACAGAGACTACATGGGGTAGAACT 
CTTAAAAATATTTAATAAAAATCAAAAATTAGGCAGTTCATCTACTAGCATACCACATCT 
GTCATCCCCACCAACCAAATTTTTTGTTAGTACACCATCTCAGCCAAGTTGCAAAAGCCA 
CTTGGAAACTGGAACCAAAGATCAAGAAACTGTTTGCTCTAAAGCTACAGAAGGAGATAA 
TACAGAATTTGGAGCATTTGTAGGTCACAGAGATAGCATGGATTTACAGAGGTTTAAAGA 
AACATCAAACAAGATAAAATTGCAGAATAACAATACTTCTGAAAACACTTTGAAACGAGT 
GAGTTCTCTTGCTGGATTTACTGACTGTCACAGAACTTCCATTCCTGTTCATTCAAAACA 
AGCAGAAAAAATCAGTAGAAGGCCATCTACCGAAGACACTCATGAAGTAGATTCCAAAGC 
AGCTTTATTACTGAAGGATTGGTTACAAGATAGACCATCAAACAGAGAAATGGGTCTCAC 
TTCTCCATTTAAGCCAGCTATGGATACAAATTACTATTATTCAGCTGTGGAAAGAAATAA 
CTTGATGAGGTTATCACAGAGCATTCCATTTACACCTGTGCCTCCAAGAGGGGAGCCTGT 
CACAGTGTATCGTTTGGAAGAGAGTTCACCCAACATACTAAATAACAGCATGTCTTCTTG 
GTCACAACTAGGCCTCTGTGCCAAAATAGAGTTTTTAAGCAAAGAGGAGATGGGAGGAGG 
TTTACGAAGAGCTGTCAAAGTACAGTGTACCTGGTCAGAACATGATATCCTCAAATCAGG 
GCATCTTTATATTATCAAATCTTTTCTTCCAGAGGTGGTTAATACATGGTCAAGTATTTA 
CAAAGAAGATACAGTTCTGCATCTCTGTCTGAGAGAAATTCAACAACAGAGAGCAGCACA 
AAAGCTTACGTTTGCCTTTAATCAAATGAAACCCAAATCCATACCATATTCTCCAAGGTT 
CCTTGAAGTTTTCCTGCTGTATTGCCATTCAGCAGGACAGTGGTTTGCTGTGGAAGAATG 
TATGACTGGAGAATTTAGAAAATACAACAATAATAATGGAGATGAGATTATTCCAACTAA 
TACTCTGGAAGAGATCATGCTAGCCTTTAGCCACTGGACTTACGAATATACAAGAGGGGA 
GTTACTGGTACTTGATTTGCAAGGTGTTGGTGAAAATTTGACTGACCCATCTGTGATAAA 
AGCAGAAGAAAAGAGATCCTGTGATATGGTTTTTGGCCCAGCAAATCTAGGAGAAGATGC 
AATTAAAAACTTCAGAGCAAAACATCACTGTAATTCTTGCTGTAGAAAGCTTAAACTTCC 
AGATCTGAAGAGGAATGATTATACGCCTGATAAAATTATATTTCCTCAGGATGAGCCTTC 
AGATTTGAATCTTCAGCCTGGAAATTCCACCAAAGMTCAGAATCAACTAATTCTGTTCG 
T CTGATGTTAT AATATTAATATTACTGAATCATTGGTTTTG C CTGC ACCTCACAGAA 



In a search of public sequence databases, the N0V4a nucleic acid sequence, located 
on chromsome 15 has 4374 of 4825 bases (90%) identical to a gb:GENBANK- 
ID:AF149013|acc:AF149013.1 mRNA from Mus musculus (Mus musculus transient receptor 
potential-related protein (ChaK) mRNA, complete cds). Public nucleotide databases include 
all GenBank databases and the GeneSeq patent database. 

The disclosed NOV4a polypeptide (SEQ ID NO:8) encoded by SEQ ID NO:7 has 
1856 amino acid residues and is presented in Table 4B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV4a has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 4B. Encoded NOV4a protein sequence (SEQ ID NO:8). 



MSQKSWIESTLTKRECVYIIPSSKDPHRCLPGCQICQQLVRCFCGRLVKQHACFTASLAM 
KYSDVKLGDHFNQAIEEWSVEKHTEQSPTDAYGVINFQGGSHSYRAKYVRLSYDTKPEVI 
LQLLLKEWQMELPKLVISVHGGMQKFELHPRIKQLLGKGLIKAAVTTGAWILTGGVNTGV 
AKHVGDALKEHASRSSRKICTIGIAPWGVIENRNDLVGRDWAPYQTLLNPLSKLNVLNN 
LHSHFILVDDGTVGKYGAEVRLRRELEKTINQQRIHAIGQGVPWALIFEGGPNVILTVL 
EYLQESPPVPVWCEGTGRAADLLAYIHKQTEEGGNLPDAAEPDIISTIKKTFNFGQNEA 
LHLFQTLMECMKRKELITVFHIGSDEHQDIDVAILTALLKGTNASAFDQLILTIAWDRVD 
IAKNHVFOTGQQWLVGSLEQAMLDALVMDRVAFVKLLIENGVSMHKFLTIPRLEELYNTK 
QG P TN PML FHL VRD VKQGNL P PG YK I TL I D I G L V I E YLMGGT YRC T YTRKR FR L I YNS LG 
GNNRRSGRNTSSSTPQLRKSHESFGNRADKKEKMRHNHFIKTAQPYRPKVDTVMEEGKKK 
RTKDEIVDIDDPETKiiFPYPLNELLIWACLMKRQVMARFLWQHGEESMAKAIiVACKIYRS 
MAYEAKQSDLVDDTSEELKQYSSDFGQLAVELLEQSFRQDETMAMKLLTYELKNWSNSTC 
LKLAVS S RLRPFVAHTCTQMLLS DMWMGRLNMRtGSTS WYKV I LS I LVP P AI LLLE YKTKAE 
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MSH I PQS QDAHQMTMDDS ENNFQNI TEE I PMEVFKE VR I LDSNEGKNEME I QMKS KKLP I 
TRKFYAFYHAPIVKFWFNTLAYLGFLMLYTFWLVQMEQLPSVQEWIVIAYIFTYAIEKV 
REVFMS EAGKVNQKI KVWFSDYFNI S DT I AI I S FFIG FGLRFGAKWNFANAYDNHVFVAG 
RL I YCLNI I FWYVRLLD FLAVNQQAGP YVMM I GKMVANMF Y I WI MAL VLL S FG VPRKAI 
LYPHEAPS WTLAKDI VFHPYWMI FGEVYAYE IDVCANDSVI PQI CGPGTWLTPFLQAVYL 
FVQYI IMVNLLIAFFSNVYLQVKAISNI WKYQRYHFIMAYHEKPVLPPPLI ILSHIVSL 
FCCICKRRKKDKTSDGPELFLTEEDQKKLHDFEEQCVEMYFNEKDDKFHSGSEERIRVTF 
ERVEQMC I Q I KE VGDRVNY I KRSLQS LDS Q I GHLQDLS ALTVDTLKTLTAQKAS EAS KVH 
NE I TRELS I S KHLAQNL I DDGPVRP S WKKHGWNTLS S S LPQGDLE SNNP FHCNI LMKD 
DKDPQCNI FGQDLPAVPQRKEFNFPEAGS S SGALFPS AVS PPELRQRLHGVELLKI FNKN 
QKLGS S STS I PHLSS PPTKFFVSTPSQPS CKSHLETGTKDQETVCSKATEGDNTEFGAFV 
GHRDSMDLQRFKETSNKIKLQNNNTSENTLKRVSSLAGFTDCHRTSIPVHSKQAEKISRR 
PSTEDTHEVDSKAALLLKDWLQDRPSNREMGLTSPFKPAMDTNYYYSAVERNNLMRLSQS 
I PFTPVPPRGEPVTVYRLEES S PNI LNNSMS S WSQLGLCAK IEFLSKEEMGGGLRRAVKV 
QCTWS EHD I LKS GHLYII KS FLPEWNTWS S I YKEDTVLHLCLRE I QQQRAAQKLT FAFN 
QMKPKSIPYSPRFLEVFLLYCHSAGQWFAVEECMTGEFRKYNNNNGDEIIPTNTLEEIML 
AFSHWTYEYTRGELLVLDLQGVGENLTDPSVIKAEEKRSCDMVFGPANLGEDAIKNFRAK 
HHCNSCCRKLKLPDLKRNDYTPDKIIFPQDEPSDLNLQPGNSTKESESTNSVRLML 



A search of sequence databases reveals that the NOV4a amino acid sequence has 1747 
of 1863 amino acid residues (93%) identical to, and 1803 of 1863 amino acid residues (96%) 
similar to the 1863 amino acid residue ptnr : SPTREMBL- ACC : Q9 JLQ 1 protein from Mus 
5 musculus (Mouse) (TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN). Public 
amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV4a is expressed in at least Adrenal Gland/Suprarenal gland, Bone Marrow, Brain, 
Bronchus, Cartilage, Colon, Hippocampus, Kidney, Liver, Lymph node, Skeletal Muscle, 
Stomach, Substantia Nigra, Tonsils and Whole Organism. This information was derived by 
1 0 determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 



NOV4b 

A disclosed NOV4b nucleic acid of 5626 nucleotides (also referred to as CG57547- 
1 5 02) encoding a TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like protein is shown in 
Table 4C. Putative untranslated regions upstream and/or downstream from the coding 
region, if any, are underlined, and the start and stop codons are in bold letters. 



Table 4C NOV4b nucleotide sequence (SEQ ID NO:9). 

AGTCCCCAGCCCCGTCGCCGGCGGAGGCGGGCGCGGGCGCGTTCCTGTGGCCAGTCACCC 
GGAGGAGTTGGTCGCACAATTA TGAAAGACTCGGCTTCTGCTGCTAGCGCCGGAGCTGAG 
TTAGTCCTGAGAAGGTTTCCCTGGGCGTTCCTTGTCCGGCCTCTGCTGCCGCCTCCGGAG 
ACGCTTCCCGATAGATGGCTACAGGCCGCGGAGGAGGAGGAGGTGGAGTTGCTGCCCTTC 
CGGAGTCCGCCCCGTGAGGAGAATGTCCCAGAAATCCTGGATAGAAAGCACTTTGACCAA 
GAGGGAATGTGTATATATTATACCAAGTTCCAAGGACCCTCACAGGTGCCTTCCAGGATG 
TCAAATTTGTCAGCAACTCGTCAGGGTTTTTGTGGTCGCTTGGTCAAGCAACATGCTTGT 
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TTTACTGCAAGTCTTGCCATGAAATACTCAGATGTGAAATTGGGTGACCATTTTAATCAG 
GCAATAGAAGAATGGTCTGTGGAAAAGCATACAGAACAGAGCCCAACGGATGCTTATGGA 
GTCATAAATTTTCAAGGGGGTTCTCATTCCTACAGAGCTAAGTATGTGAGGCTATCATAT 
GACACCAAACCTGAAGTCATTCTGCAACTTCTGCTTAAAGAATGGCAAATGGAGTTACCC 
AAACTTGTTATCTCTGTACATGGGGGCATGCAGAAATTTGAGCTTCACCCACGAATCAAG 
CAGTTGCTTGGAAAAGGTCTTATTAAAGCTGCAGTTACAACTGGAGCCTGGATTTTAACT 
GGAGGAGTAAACACAGGTACAGGTGTGGCAAAACATGTTGGAGATGCCCTCAAAGAACAT 
GCTTCCAGATCATCTCGAAAGATTTGCACTATCGGAATAGCTCCATGGGGAGTGATTGAA 
AACAGAAATGATCTTGTTGGGAGAGATGTAAGAATTATTTATCAAACCTTATTGAACCCC 
CTGAGCAAATTGAATGTTTTGAATAATCTGCATTCCCATTTCATATTGGTGGATGATGGC 
ACTGTTGGAAAGTATGGGGCGGAAGTCAGACTGAGAAGAGAACTTGAAAAAACTATTAAT 
CAGCAAAGAATTCATATTGGCCAGGGTGTCCCTGTGGTGGCACTTATATTTGAGGGTGGG 
CCAAATGTTATCCTCACAGTTCTTGAATACCTTCAGGAAAGCCCCCCTGTTCCAGTAGTT 
GTGTGTGAAGGAACAGGCAGAGCTGCAGATCTGCTAGCGTATATTCATAAACAAACAGAA 
GAAGGAGGGGATGCAGCAGAGCCCGATATTATTTCCACTATCAAAAAAACATTTAACTTT 
GGCCAGAATGAAGCACTTCATTTATTTCAAACACTGATGGAGTGCATGAAAAGAAAGGAG 
CTTGTAACTGTTTTCCATATTGGGTCAGATGAACATCAAGATATAGATGTAGCAATACTT 
ACTGCACTGCTAAAAGGTACTAATGCATCTGCATTTGACCAGCTTATCCTTACATTGGCA 
TGGGATAGAGTTGACATTGCCAAAAATCATGTATTTGTTTATGGACAGCAGTGGCCATTG 
CACTCCAGCCTGGGCAACAGAGTGAGACTCTCTCTCAAAAAAAAAAAACAAAAACAAAAA 
CAAAAACAAAAACAAAAACCAACACCTAGAAATTCAGAGTTAGTTGGATCCTTGGAACAA 
GCTATGCTTGATGCTCTTGTAATGGATAGAGTTGCATTTGTAAAACTTCTTATTGAAAAT 
GGAGTAAGCATGCATAAATTCCTTACCATTCCGAGACTGGAAGAACTTTACAACACTAAT 
CTTCCTCCAGGATATAAGATCACTCTGATTGATATAGGACTTGTTATTGAATATCTCATG 
GGAGGAACCTACAGATGCACCTATACTAGGAAACGTTTTCGATTAATATATAATAGTCTT 
GGTGGAAATAATCGGTTTTCCTTCCAGGAGCCCAACCACACTCGCACGGTAAATATTAGA 
GACAAATCTCCTCATGCTTCTGGCAAGAAGAAGGGAAAGAAGAAAAGAACCAAAGATGAA 
ATTGTAGACATTGATGATCCAGAAACCAAGCGCTTTCCTTATCCACTTAATGAACTTTTA 
ATTTGGGCTTGCCTTATGAAGAGGCAGGTCATGGCCCGTTTTTTATGGCAACATGGTGAA 
GAATCAATGGCTAAAGCATTAGTTGCCTGTAAGATCTATCGTTCAATGGCATATGAAGCA 
AAG C AG AGTGAC CTGGTAG ATGATACTTCAGAAGAACTAAAAC AGTATT C CAAGGATTTT 
GGTCAGTTGGCCGTTGAATTATTAGAACAGTCCTTCAGACAAGATGAAACCATGGCTATG 
AAATTGCTCACTTATGAACTGAAGAACTGGAGTAATTCAACCTGCCTTAAGTTAGCAGTT 
TCTTCAAGACTTAGACCTTTTGTAGCTCACACCTGTACACAAATGTTGTTATCTGATATG 
TGGATGGGAAGGCTGAATATGAGGAAAAATTCCTGGTACAAGGTAATACTAAGCATTTTA 
GTTC C AC CTGC C ATATTG CTGTTAGAGTATAAAACTAAGG CTGAAATGTCC C ATATCCCA 
CAATCTCAAGATGCTCATCAGATGACAATGGATGACAGCGAAAACAACAGTAATGAAGGA 
AAGAATGAGATGGAGATACAAATGAAATCAAAAAAGCTTCCAATTACGCGAAAGTTTTAT 
GCCTTTTATCATGCACCAATTGTAAAATTCTGGTTTAACACGTTGGCATATTTAGGATTT 
CTGATGCTTTATACATTTGTGGTTCTTGTACAAATGGAACAGTTACCTTCAGTTCAAGAA 
TGGATTGTTATTGCTTATATTTTTACTTATGCCATTGAGAAAGTCCGTGAGATCTTTATG 
TCTGAAGCTGGGAAAGTAAACCAGAAGATTAAAGTATGGTTTAGTGATTACTTCAACATC 
AGTGATACAATTGCCATAATTTCTTTCTTCATTGGATTTGGACTAAGATTTGGAGCAAAA 
TGGAACTTTGCAAATGCATATGATAATCATGTTTTTGTGGCTGGAAGATTAATTTACTGT 
CTTAACATAATATTTTGGTATGTGCGTTTGCTAGATTTTCTAGCTGTTAATCAACAGGCA 
GGACCTTATGTAATGATGATTGGAAAAATGGTAAATATGTTCTACATTGTAGTGATTATG 
GCTCTTGTATTACTTAGTTTTGGTGTTCCCAGAAAGGCAATACTTTATCCTCATGAAGCA 
CCATCTTGGACTCTTGCTAAAGATATAGTTTTTCACCCATACTGGATGATTTTTGGTGAA 
GTTTATGCATACGAAATTGATTGTGGTCCTGGGACGTGGTTGACTCCATTTCTTCAAGCA 
GTCTACCTCTTTGTACAGTATATCATTATGGTTAATCTTCTTATTGCATTTTTCAAGAGC 
AATGTGTATTTACAAGTGAAGGCAATTTCCAATATTGTATGGAAGTACCAGCGTTATCAT 
TTTATTATGGCTTATCATGAGAAACCAGTTCTGCCTCCTCCACTTATCATTCTTAGCCAT 
ATAGTTTCTCTGTTTTGCTGCATATGTAAGAGAAGAAAGAAAGATAAGACTTCCGATGGA 
CCAAGTAAGATAGAACTTTTCTTAACAGAAGAAGATCAAAAGAAACTTCATGATTTTGAA 
GAGCAGTGTGTTGAAATGTATTTCAATGAAAAAGATGACAAATTTCATTCTGGGAGTGAA 
GAGAGAATTCGTGTCACTTTTGAAAGAGTGGAACAGAAGCCCATTCAGATTAAAGAAGTT 
GGAGATCGTGTCAACTACATAAAAAGATCATTACAATCATTAGATTCTCAAATTGGCCAT 
TTGCAAGATCTTTCAGCCCTGACGGTAGATACATTAAAAACACTCACTGCCCAGAAAGCG 
TCGGAAGCTAGCAAAGTTCATAATGAAATCACACGAGAACTGAGCATTTCCAAACACTTG 
GCTCAAAACCTTATTGATGATGGTCCTGTAAGACCTTCTGTATGGAAAAAGCATGGTGTT 
GTAAATACACTTAGCTCCTCTCTTCCTCAAGGTGATCTTGAAAGTAATAATCCTTTTCAT 
TGTAATATTTTAATGAAAGATGACAAAGATCCCCAGTGTAATATATTTGGTCAAGACTTA 
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CCTGCAGTACCCCAGAGAAAAGAATTTAATTTTCCAGAGGCTGGTTCCTCTTCTGGTGCC 
TTATTCCCAAGTGCTGTTTCCCCTCCAGAACTGCGACAGAGACTACATGGGGTAGAACTC 
TTAAAAATATTTAATAAAAATCAAAAATTAGGCAGTTCATCTACTAGCATACCACATCTG 
TCATCCCCACCAACCAAATTTTTTGTTAGTACACCATCTCAGCCAAGTTGCAAAAGCCAC 
TTGGAAACTGGAACCAAAGATCAAGAAACTGTTTGCTCTAAAGCTACAGAAGGAGATAAT 
ACAGAATTTGGAGCATTTGTAGGTCACAGAGATAGCATGGATTTACAGAGGTTTAAAGAA 
ACATCAAACAAGATAAAATTGCAGAATAACAATACTTCTGAAAACACTTTGAAACGAGTG 
AGTTCTCTTGCTGGATTTACTGACTGTCACAGAACTTCCATTCCTGTTCATTCAAAACAA 
GCAGAAAAAATCAGTAGAAGGCCATCTACCGAAGACACTCATGAAGTAGATTCCAAAGCA 
GCTTTAATACCGGATTGGTTACAAGATAGACCATCAAACAGAGAAATGGGTCTCACTTCT 
CCATTTAAGCCAGCTATGGATACAAATTACTATTATTCAGCTGTGGAAAGAAATAACTTG 
ATGAGGTTATCACAGAGCATTCCATTTACACCTGTGCCTCCAAGAGGGGAGCCTGTCACA 
GTGTATCGTTTGGAAGAGAGTTCACCCAACATACTAAATAACAGCATGTCTTCTTGGTCA 
CAACTAGGCCTCTGTGCCAAAATAGAGTTTTTAAGCAAAGAGGAGATGGGAGGAGGTTTA 
CGAAGAGCTGTCAAAGTACAGTGTACCTGGTCAGAACATGATATCCTCAAATCAGGGCAT 
CTTTATATTATCAAATCTTTTCTTCCAGAGGTGGTTAATACATGGTCAAGTATTTACAAA 
GAAGATACAGTTCTGCATCTCTGTCTGAGAGAAATTCAACAACAGAGAGCAGCACAAAAG 
CTTACGTTTGCCTTTAATCAAATGAAACCCAAATCCATACCATATTCTCCAGGGGAGTTA 
CTGGTACTTGATTTGCAAGGTGTTGGTGAAAATTTGACTGACCCATCTGTGATAAAAGCA 
GAAGAAAAGAGATCCTGTGATATGGTTTTTGGCCCAGCAAATCTAGGAGAAGATGCAATT 
AAAAACTTCAGAGCAAAACATCACTGTAATTCTTGCTGTAGAAAGCTTAAACTTCCAGAT 
CTGAAGAGGAATGATTATACGCCTGATAAAATTATATTTCCTCAGGATGAGCCTTCAGAT 
TTGAATCTTCAGCCTGGAAATTCCACCAAAGAATCAGAATCAACTAATTCTGTTCGTCTG 
ATGTTATA ATATTAATATTACTGAATCATTGGTTTTGCCTGCACCTCACAGAA ATGTTAC 
TGTGTCACTTTTCCCTCGGGAGGAAATTGTTTGGTAATATAGAAAG 



In a search of public sequence databases, the N0V4b nucleic acid sequence, located on 
chromsome 15 has 1134 of 1246 bases (91%) identical to a gb:GENBANK- 
ID:AF149013|acc:AF149013.1 mRNA from Mus musculus (Mus musculus transient receptor 
potential-related protein (ChaK) mRNA, complete cds). Public nucleotide databases include 
all GenBank databases and the GeneSeq patent database. 

The disclosed NOV4b polypeptide (SEQ ID NO: 10) encoded by SEQ ID NO:9 has 
1815 amino acid residues and is presented in Table 4D using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV4b has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 4D. Encoded NOV4b protein sequence (SEQ ID NO:10). 

MKDSASAASAGAELVLRRFPWAFLVRPLLPPPETLPDRWLQAAEEEEVELLPFRSPPREE 
NVPEILDRKHFDQEGMCIYYTKFQGPSQVPSRMSNLSATRQGFCGRLVKQHACFTASLAM 
KYSDVKLGDHFNQAIEEWSVEKHTEQSPTDAYGVINFQGGSHSYRAKYVRLSYDTKPEVI 
LQLLLKE WQMELPKLVI S VHGGMQKFELHPR I KQLLGKGL I KAAVTTGAW I LTGGVNTGT 
GVAKHVGDALKEHASRS SRKI CT IGI APWGVIENRNDLVGRDVR 1 1 YQTLLNPLS KLNVL 
NNLHSHF I LVDDGTVGKYGAEVRLRRELEKT INQQR IH IGQGVP WAL I FEGG PNVI LTV 
LEYLQE S P P VP VWCEGTGRAADLLAY IHKQTEEGGD AAE PD 1 1 ST I KKT FNFGQNE ALH 
LFQTLMECMKRKELVTVFHIGSDEHQDIDVAILTALLKGTNASAFDQLILTLAWDRVDIA 
KNHVFVYGQQWPLHSSLGNRVRLSLKKKKQKQKQKQKQKPTPRNSELVGSLEQAMLDALV 
MDRVAFVKLLIENGVSMHKFLTIPRLEELYNTNLPPGYKITLIDIGLVIEYLMGGTYRCT 
YTRKRFRL I YNS LGGNNRFS FQEPNHTRTVN I RDKS PHAS GKKKGKKKRTKDE I VD I DDP 
ETKRFPYPLNELLIWACLMKRQVMARFLWQHGEESMAKALVACKIYRSI^IAYEAKQSDLVD 
DTSEELKQYSKDFGQLAVELLEQSFRQDETMAMKLLTYELKNWSNSTCLKLAVSSRLRPF 
VAHTCTQMLLSDMWMGRLNMRKNSWYKVILSILVPPAILLLEYKTKAEMSHIPQSQDAHQ 
MTMDDSENNSNEGKNEMEIQMKSKKLPITRKFYAFYHAPIVKFWFNTIjAYLGFLMLYTFV 
VLVQMEQLPSVQEWIVIAYIFTYAIEKVREIFMSEAGKVNQKIKVWFSDYFNISDTIAII 
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S FFI GFGLRFGAKWNFANAYDNHVFVAGRL I YCLNI I FWYVRLLDFLAVNQQAGPYVMMI 
GKMVNMFY I W I MAL VLL S FGVPRKA I L Y PHE AP S VJTLAKD I VFHP YWM I FGE VYA YE I D 
CGPGTWLTPFLQAVYLFVQYI IMVKLLI AFFKSNVYLQVKAI SNI WKYQRYHFIMAYHE 
KPVLPPPLIILSHIVSLFCCICKRRKKDKTSDGPSKIELFLTEEDQKKLHDFEEQCVEMY 
FNEKDDKFHSGSEERIRVTFERVEQKPIQIKEVGDRVNYIKRSLQSLDSQIGHLQDLSAL 
TVDTLKTLTAQKASE ASKVHNE I TRELS I SKHLAQNLIDDGP VRP S VWKKHGWNTLS S S 
LPQGDLESNNPFHCNILMKDDKDPQCNIFGQDLPAVPQRKEFNFPEAGSSSGALFPSAVS 
PPELRQRLHGVELLKIFNKNQKLGSSSTSIPHLSSPPTKFFVSTPSQPSCKSHLETGTKD 
QETVC S KATEGDNTE FGAFVGHRDSMDLQRFKETSNKI KLQNNNTS ENTLKRVS S LAG FT 
DCHRTSIPVHSKQAEKISRRPSTEDTHEVDSKAALIPDWLQDRPSNREMGLTSPFKPAMD 
TNYYYSAVERNNLMRLSQSIPFTPVPPRGEPVTVYRLEESSPNILNNSMSSWSQLGLCAK 
I EFLSKEEMGGGLRRAVKVQCTWSEHD I LKSGHLY 1 1 KS FLPEWNTWS S I YKEDTVLHL 
CLRE I QQQRAAQKLTFAFNQMKPKS I PYS PGELLVLDLQGVGENLTDPS VI KAEEKRS CD 
MVFGPANLGEDAI KNFRAKHHCNSCCRKLKLPDLKRNDYTPDKI I FPQDEPSDLNLQPGN 
STKESESTNSVRLML 



A search of sequence databases reveals that the N0V4b amino acid sequence has 776 
of 892 amino acid residues (86%) identical to, and 819 of 892 amino acid residues (91%) 
similar to, the 1 863 amino acid residue ptnr:SPTREMBL-ACC:Q9JLQl protein from Mus 
musculus (Mouse) (TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN). Public 
amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV4b is expressed in at least adrenal gland, bone marrow, brain - amygdala, brain - 
cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 
pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 
spinal cord, spleen, stomach, testis, thyroid, trachea and uterus. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV4a polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 4E. 



Table 4E. BLAST results for NOV4 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
{%) 


Expect 


qi | 13 95978 5 |gb|AAK4 


LTRPC7 [Homo 
sapiens] 


1865 


1847/1866 
(98%) 


1853/1866 
(98%) 


0.0 


4211 .1 | (AY032950) 


gi | 13 562153 | gb | AAK1 


channel-kinase 1 
[Homo sapiens] 


1864 


1845/1866 
(98%) 


1851/1866 
(98%) 


0.0 


9738.2|AF346629 1 


(AF346629) 


gi|l4009344|gb|AAK5 


LTRPC7 [Mus 
musculus] 


1863 


1748/1866 
(93%) 


1803/1866 
(95%) 


0.0 


0377.1 | (AY032951) 
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receptor 
potential M7 ; 
transient 
receptor 

potential- related 
protein, ChaK 
[Mus musculus] 


J.OOJ 


J. / ft / / ± o bo 
(93%) 


loUJ/lobb 
(96%) 


0 . 0 


n a i a o c i 1 


gi| 14211383 |gb|AAK5 


transient 

receptor 

potential 

phospholipase C 

interacting 

kinase [Mus 

musculus] 


1862 


1747/1866 
(93%) 


1802/1866 
(95%) 


0.0 


7433 . 1 |AF376052 1 


(AF376052) 



Table 4F-G lists the domain descriptions from DOMAIN analysis results against 
NOV4. This indicates that the NOV4 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 4F. Domain Analysis of NOV4 

gnl | Pfam |pfam02816, MHCK_EF2_kinase , MHCK/EF2 kinase domain family. 
This family is a novel family of eukaryotic protein kinase catalytic 
domains, which have no detectable similarity to conventional kinases. 
The family contains myosin heavy chain kinases and Elongation Factor- 2 
kinase and a bifunctional ion channel. 

CD-Length = 206 residues, 94.7% aligned 

Score = 79.7 bits (195), Expect = le-15 



Table 4G. Domain Analysis of NOV4 

gnl | Pf amlpfamQ0520 / ion_trans, Ion transport protein. This family 
contains Sodium, Potassium, Calcium ion channels. This family is 6 
transmembrane helices in which the last two helices flank a loop which 
determines ion selectivity. In some sub-families (e.g. Na channels) 
the domain is repeated four times, whereas in others {e.g. K channels) 
the protein forms as a tetramer in the membrane . 

CD-Length = 191 residues, 99.0% aligned 

Score = 62.8 bits (151) / Expect = 2e-10 



Capacitative calcium entry (CCE) describes CA2+ influx into cells that replenishes 
CA2+ stores emptied through the action of IP3 and other agents. It is an essential component 
of cellular responses to many hormones and growth factors. The molecular basis of this form 
of Ca2+ entry is complex and may involve more than one type of channel. Studies on visual 
signal transduction in Drosophila led to the hypothesis that a protein encoded in transient 
receptor potential (Trp) and related proteins may be a component of CCE channels. Zhu et al.) 
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reported the existence of six trp-related genes in the mouse genome. Expression in L cells of 
small portions of these genes in antisense orientation suppressed CCE. Expression in COS 
cells of two full-length cDNAs encoding human trp homologs, Htrpl and Htrp3, increased 
CCE. This identifies mammalian gene products that participate in CCE. 

Human TRPC genes encode proteins with sequence similarity to the Drosophila 
'transient receptor potential' (trp) gene product. TRPC proteins are thought to be subunits of 
capacitative calcium entry (CCE) channels, which mediate calcium influx into cells to 
replenish internal stores of calcium. Using exon trapping on a contig from 21q22.3, Kudoh et 
al. (1997) isolated an exon whose deduced amino acid sequence shows similarity to the 
sequences of human TRPC and Drosophila trp proteins. Nagamine et al. (1998) isolated 
human fetal brain and caudate nucleus cDNAs corresponding to the exon and its parent gene. 
The deduced 1,503-amino acid protein, which is named TRPC7, is 22.9% identical to human 
TRPC1 (602343), 21.2% identical to human TRPC3 (602345), and 22.6% identical to 
Drosophila trp. TRPC7 contains 7 predicted membrane-spanning domains. The TRPC7 gene 
has 32 exons spanning approximately 90 kb. Northern blot analysis of human tissues detected 
a 6.5-kb TRPC7 transcript predominantly in fetal and adult brains, where it was expressed in 
several regions. In caudate nucleus and putamen, a putative 5.5-kb alternatively spliced 
TRPC7 product also was detected. 

The disclosed NOV4 nucleic acid of the invention encoding a TRANSIENT 
RECEPTOR POTENTIAL-RELATED PROTEIN-like protein includes the nucleic acid 
whose sequence is provided in Table 4A or 4C or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Table 4A or 4C while still encoding a protein that maintains its 
TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
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variant nucleic acids, and their complements, up to about 10 percent of the bases may be so 
changed. 

The disclosed NOV4 protein of the invention includes the TRANSIENT RECEPTOR 
POTENTIAL-RELATED PROTEIN-like protein whose sequence is provided in Table 4B or 
5 4D. The invention also includes a mutant or variant protein any of whose residues may be 
changed from the corresponding residue shown in Table 4B or 4D while still encoding a 
protein that maintains its TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like 
activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 7 percent of the residues may be so changed. 

10 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this TRANSIENT 
RECEPTOR POTENTIAL-RELATED PROTEIN-like protein (NOV4) may function as a 
member of a "TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN family". 

1 5 Therefore, the NOV4 nucleic acids and proteins identified here may be useful in potential 

therapeutic applications implicated in (but not limited to) various pathologies and disorders as 
indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 
diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 

20 therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of 
all tissues and cell types composing (but not limited to) those defined here. 

The NOV4 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the TRANSIENT 

25 RECEPTOR POTENTIAL-RELATED PROTEIN-like protein (NOV4) may be useful in gene 
therapy, and the TRANSIENT RECEPTOR POTENTIAL-RELATED PROTEIN-like protein 
(NOV4) may be useful when administered to a subject in need thereof. By way of nonlimiting 
example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from adrenoleukodystrophy, congenital adrenal hyperplasia, hemophilia, 

30 hypercoagulation, idiopathic thrombocytopenic purpura, autoimmune disease, allergies, 

immunodeficiencies, transplantation, graft versus host disease, Von Hippel-Lindau (VHL) 
syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 
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neurodegeneration, arthritis, tendonitis, diabetes, renal artery stenosis, interstitial nephritis, 
glomerulonephritis, polycystic kidney disease, systemic lupus erythematosus, renal tubular 
acidosis, IgA nephropathy, cirrhosis, lymphedema, ulcers, tonsillitis, or other pathologies or 
conditions. The NOV4 nucleic acid encoding the TRANSIENT RECEPTOR POTENTIAL- 
RELATED PROTEIN-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV4 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV4 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV4 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 



NOV5 

A disclosed NOV5 nucleic acid of 1869 nucleotides (also referred to as CG57609-01) 
encoding a Epsin-3-like protein is shown in Table 5 A. Putative untranslated regions upstream 
and/or downstream from the coding region, if any, are underlined, and the start and stop 
codons are in bold letters. 



Table 5A. NOV5 nucleotide sequence (SEQ ID NO:ll). 

GCGGGGGCGAGGGCCACCCACCTCCAAGTCTCCAGCCATGACGACCTCCGCACTCCGGCG 
CCAGGTGAAGAACATCGTGCACAACTACTCCGAGGCAGAAATCAAGGTGCGCGAGGCCAC 
CAGCAATGACCCCTGGGGCCCCCCTAGTTCGCTCATGTCCGAGATCGCTGACCTGACCTT 
CAACACAGTGGCCTTCACCGAAGTCATGGGCATGCTGTGGCGGCGGCTCAATGACAGCGG 
CAAGAACTGGCGGCACGTGTACAAGGCTCTAACATTGCTGGACTACCTGCTCAAGACGGG 
CTCCGAGCGGGTGGCCCACCAGTGCCGCGAGAACCTCTACACCATCCAGACACTCAAGGA 
CTTCCAGTACATCGACCGCGACGGCAAGGACCAGGGCGTCAACGTGCGCGAGAAGGTCAA 
GCAGGTGATGGCCCTGCTCAAGGATGAGGAGCGGCTGCGGCAGGAGCGAACCCACGCCCT 
CAAGACCAAGGAGCGCATGGCACTGGAGGGCATCGGCCCGCTGGTGCTGGGCTTCAGCCG 
CCGCTACGGCGAGGACTACAGCCGCTCCCGGGGCTCCCCGTCCTCCTACAACTCCTCCTC 
TTCGTCACCCCGCTATACCTCCGACCTGGAGCAGGCCCGGCCTCAGACGTCAGGGGAAGA 
GGAACTGCAGCTGCAGCTGGCCCTCGCCATGAGCCGTGAGGAGGCAGAGAAGGAGGTGAG 
GTCCTGGCAGGGTGATGGCTCCCCCATGGCCAATGGTGCAGGGGCCGTGGTCCACCATCA 
GCGGGACAGAGAGCCTGAGAGAGAAGAGAGAAAGGAGGAGGAGAAGCTAAAAACCAGCCA 
GTCCTCCATCCTGGACTTGGCTGACATCTTCGTACCTGCCCTGGCCCCGCCCTCCACACA 
CTGCTCTGCTGACCCATGGGACATCCCAGGTTTTAGGCCGAACACAGAGGCCAGTGGATC 
CTCCTGGGGGCCTTCTGCAGACCCCTGGTCTCCGATCCCCTCAGGAACCGTCCTGTCCCG 
AAGCCAGCCCTGGGATCTGACTCCCATGCTCTCCTCCTCTGAGCCCTGGGGCAGGACCCC 
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AGTGCTGCCTGCTGGGCCCCCCACCACAGACCCCTGGGCCCTGAACTCTCCCCACCACAA 
ACTCCCCAGCACTGGGGCTGACCCTTGGGGAGCCTCCCTGGAGACCTCCGACACACCTGG 
TGGTGCCTCGACCTTTGACCCATTTGCCAAACCTCCAGAATCCACAGAGACCAAGGAGGG 
GCTGGAGCAGGCCCTGCCCTCTGGGAAGCCCAGCAGCAGCGGGGAGCTGGACCTGTTTGG 
AGACCCCAGCCCCAGTTCCAAGCAAAATGGCACGAAGGAGCCAGATGCCCTGGACCTGGG 
CATACTAGGGGAAGCACTAACCCAGCCAAGCAAAGAGGCCCGAGCTTGCCGGACTCCCGA 
GTCCTTCCTGGGTCCCTCAGCTTCCTCCTTGGTCAACCTTGACTCGTTGGTCAAGGCACC 
CCAGGTTGCAAAGACCCGGAACCCCTTCCTGACAGGTGGTCTCAGCGCTCCGTCCCCCAC 
CAACCCGTTCGGCGCGGGCGAGCCGGGCAGGCCGACGCTAAACCAGATGCGCACCGGCTC 
GCCGGCGCTGGGCCTGGCAGGCGGGCCTGTGGGGGCGCCCCTGGGCTCCATGACCTACAG 
CGCCTCTCTGCCCCTCCCGCTCAGCAGCGTGCCAGCTGGCTTGACCCTCCCCGCCTCGGT 
TAGCGTCTTCCCGCAGGCCGGAGCCTTCGCACCGCAGCCGCTGCTGCCCACGCCGAGCTC 
AGCCGGGCCGCGGCCCCCGCCCCCGCAGACCGGCACCAACCCCTTCCTCTGAGCCCCGCC 
CCGTCCCAT 



10 



In a search of public sequence databases, the NOV5 nucleic acid sequence, located on 
chromsome 17 has 1210 of 1234 bases (98%) identical to a gb:GENBANK- 
ID:AK000785|acc:AK000785.1 mRNA from Homo sapiens (Homo sapiens cDNA FLJ20778 
fis, clone COL05704). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV5 polypeptide (SEQ ID NO: 12) encoded by SEQ ID NO:l 1 has 
604 amino acid residues and is presented in Table 5B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV5 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.4500. 



Table 5B. Encoded NOV5 protein sequence (SEQ ID NO:12). 



MTTSALRRQVKNIVHNYSEAEIKVREATSNDPWGPPSSLMSEIADLTFNTVAFTEVMGML 
WRRLND S GKNWRHVYKALTLLD YLLKTG S ERVAHQCRENL YT IQTLKDFQY I DRDGKDQG 
VNVREKVKQVMAIjLKDEERLRQERTHAL 

PSSYNSSSSSPRYTSDLEQARPQTSGEEELQLQIJU^AMSREEAEKEVRSWQGDGSPMANG 
AGAWHHQRDREPEREERKEEEKLKTSQSSILDLADIFVPALAPPSTHCSADPWDIPGFR 
PNTEASGSSWGPSADPWSPIPSGTVLSRSQPWDLTPMLSSSEPWGRTPVLPAGPPTTDPW 
ALNSPHHKLPSTGADPWGASLETSDTPGGASTFDPFAKPPESTETKEGLEQALPSGKPSS 
SGELDLFGDPSPSSKQNGTKEPDALDLGILGEALTQPSKEARACRTPESFLGPSASSLVN 
LDSLVKAPQVAKTRNPFLTGGLSAPSPTNPFGAGEPGRPTLNQMRTGSPALGLAGGPVGA 
PLGSMTYSASLPLPLSSVPAGLTLPASVSVFPQAGAFAPQPLLPTPSSAGPRPPPPQTGT 
NPFL 



A search of sequence databases reveals that the NOV5 amino acid sequence has 398 of 
441 amino acid residues (90%) identical to, and 406 of 441 amino acid residues (92%) similar 
1 5 to, the 632 amino acid residue ptnr:TREMBLNEW-ACC:AAG45223 protein from Homo 
sapiens (Human) (EPSIN 3). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 
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N0V5 is expressed in at least skin. This information was derived by determining the 
tissue sources of the sequences that were included in the invention including but not limited to 
SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV5 polypeptide has homology to the amino acid sequences shown in 
5 the BLASTP data listed in Table 5C. 



Table 5C. BLAST results for NOV5 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


refjXP 037971. 1| 
(XM_037971) 


epsin 3 [Homo 
sapiens] 


632 


598/633 
(94%) 


598/633 
(94%) 


0.0 


ref|NP 060427. l| 
(NM_017957 


epsin 3 [Homo 
sapiens] 


632 


596/633 
(94%) 


597/633 
(94%) 


0.0 


gb(AAH164 54 .l|AAHl6 
454 (BC016454) 


(protein for 
MGC:25634) [Mus 
musculus] 


636* 


518/646 
(80%) 


535/646 
(82%) 


0.0 


dbj |BAB26309.1| 
(AK009469) 


homolog to EPSIN 
3 -putative [Mus 
musculus] 


569 


452/579 
(78%) 


468/579 
(80%) 


0.0 


gb| AAH0103 8 . 1 | AAH01 
038 (BC001038) 


Similar to epsin 
3 [Homo sapiens] 


208 


186/205 
(90%) 


188/205 
(90%) 


e-101 



Tables 5D-E list the domain descriptions from DOMAIN analysis results against 
10 NOV5. This indicates that the NOV sequence has properties similar to those of other proteins 
known to contain this domain. 



Table 5E. Domain Analysis of NOV5 

gnl |Pfam|pfam01417 / ENTH, ENTH domain. The ENTH (Epsin N-terminal 
homology) domain is found in proteins involved in endocytosis and 
cytoskeletal machinery. 

CD-Length = 123 residues, 100.0% aligned 

Score - 11 A bits (442), Expect = le-44 



Table 5F. Domain Analysis of NOV5 

gnl | Smart [ smart 0 0288 , VHS, Domain present in VPS-27, Hrs and STAM ; 
Unpublished observations 

CD-Length = 133 residues, 88.7% aligned 

Score = 38.5 bits (88), Expect = 0.001 



1 5 The mammalian protein epsin is required for endocytosis. There are two homologous 

yeast proteins, Entlp and Ent2p, which are similar to mammalian epsin. An essential function 
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for the highly conserved N-terminal epsin N-terminal homology (ENTH) domain was revealed 
using deletions and randomly generated temperature-sensitive entl alleles. Changes in 
conserved ENTH domain residues in entl(ts) cells revealed defects in endocytosis and actin 
cytoskeleton structure. The Entl protein was localized to peripheral and internal punctate 
5 structures, and biochemical fractionation studies found the protein associated with a large, 
Triton X-100-insoluble pellet. Finally, an Entlp clathrin-binding domain was mapped to the 
final eight amino acids (RGYTLIDL*) in the Entl protein sequence. Based on these and other 
data, yeast epsin-like proteins are essential components of an endocytic complex that may act 
at multiple stages in the endocytic pathway. 

10 An approximately 140 amino acid domain is shared by a variety of proteins in budding 

and fission yeast, nematode, rat, mouse, frog, oat, and man. Typically, this domain is located 
within 20 residues of the N-terminus of the various proteins. The percent identity among the 
domains in the 12 proteins ranges from 42 to 93%, with 16 absolutely conserved residues. 
Even though these proteins share little beyond their segment of homology, data are emerging 

15 that several of the proteins are involved in endocytosis and or regulation of cytoskeletal 
organization. This protein segment is the ENTH domain, for Epsin N-terminal Homology 
domain. 

The disclosed NOV5 nucleic acid of the invention encoding a Epsin-3-like protein 
includes the nucleic acid whose sequence is provided in Table 5 A or a fragment thereof. The 

20 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 5 A while still encoding a protein that maintains 
its Epsin-3-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 

25 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 

30 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 2 percent of the 
bases may be so changed. 

The disclosed NOV5 protein of the invention includes the Epsin-3-like protein whose 
sequence is provided in Table 5B. The invention also includes a mutant or variant protein any 

41 



~u n m ms- «s ^ w&>-t* 

TLuP 1 JU !> 'n«$ uuj^^j.. "12^ tiUJl sq 3Luu fa ?Amxi. 



of whose residues may be changed from the corresponding residue shown in Table 5B while 
still encoding a protein that maintains its Epsin-3-like activities and physiological functions, or 
a functional fragment thereof. In the mutant or variant protein, up to about 1 0 percent of the 
residues may be so changed. 
5 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(Fab)2,that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Epsin-3-like protein 
(NOV5) may function as a member of a "Epsin-3 family". Therefore, the NOV5 nucleic acids 
and proteins identified here may be useful in potential therapeutic applications implicated in 

10 (but not limited to) various pathologies and disorders as indicated below. The potential 

therapeutic applications for this invention include, but are not limited to: protein therapeutic, 
small molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 
research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 

1 5 (but not limited to) those defined here. 

The NOV5 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Epsin-3-like protein 
(NOV5) may be useful in gene therapy, and the Epsin-3-like protein (NOV5) may be useful 

20 when administered to a subject in need thereof. By way of nonlimiting example, the 

compositions of the present invention will have efficacy for treatment of patients suffering 
from psoriasis, actinic keratosis, tuberous sclerosis, acne, hair growth/loss, allopecia, 
pigmentation disorders, endocrine disorders, or other pathologies or conditions. The NOV5 
nucleic acid encoding the Epsin-3-like protein of the invention, or fragments thereof, may 

25 further be useful in diagnostic applications, wherein the presence or amount of the nucleic acid 
or the protein are to be assessed. 

NOV5 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV5 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 

30 art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV5 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 
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NOV6 

A disclosed NOV6 nucleic acid of 2646 nucleotides (also referred to as CG57611-01) 
encoding a CD22-like protein is shown in Table 6A. Putative untranslated regions upstream 
and/or downstream from the coding region, if any, are underlined, and the start and stop 
codons are in bold letters. 



Table 6A. NOV6 nucleotide sequence (SEQ ID NO: 13). 

ATGGACAACCCACAGGCTCTGCCACTCTTCCTACTCCTGGCCTCCTTGGTAGGGATCCTC 
ACCCTCAGAGCCTCTTCTGGACTTCAGCAAACCAACTTCTCCTCTGCCTTCTCTTCAGAC 
TCAAAGAGCTCTTCCCAGGGGCTGGGTGTGGAAGTTCCCTCCATCAAACCTCCCAGCTGG 
AAAGTTCCAGATCAGTTCCTGGATTCAAAAGCCTCTGCTGGAATCTCTGATTCCAGCTGG 
TTTCCTGAGGCCCTGAGTTCCAACATGTCTGGGTCCTTCTGGTCAAATGTTTCTGCTGAG 
GGCCAAGATTTGAGCCCGGTTTCCCCCTTCTCTGAAACCCCTGGTTCTGAAGTATTTCCT 
GATATTTCGGATCCTCAAGTTCCTGCCAAAGACCCCAAGCCTTCCTTCACTGTTAAGACC 
CCAGCTTCAAACATTTCTACTCAAGTCTCCCATACCAAACTGTCTGTTGAGGCCCCAGAT 
TCAAAATTCTCCCCGGATGATATGGATCTTAAACTCTCTGCCCAGAGCCCTGAATCCAAA 
TTTTCTGCAGAGACCCACTCAGCTGCAAGCTTTCCCCAGCAGGTGGGGGGCCCACTCGCT 
GTGCTGGTGGGGACCACCATCCGGCTCCCCCTAGTCCCAATCCCCAACCCTGGGCCCCCC 
ACCTCTCTGGTGGTCTGGCGCCGGGGCTCAAAGGTGCTGGCAGCTGGGGGCCTGGGGCCA 
GGGGCACCTCTGATCAGCCTGGACCCTGCTCACCGAGACCACCTGCGATTTGACCAGGCC 
CGGGGGGTTCTGGAGCTCGCCTCTGCCCAGCTGGACGATGCAGGGGTCTACACGGCTGAG 
GTCATCCGGGCAGGGGTCTCCCAGCAGACTCACGAGTTCACGGTGGGTGTGTATGAGCCC 
CTACCCCAGCTGTCGGTTCAGCCCAAGGCTCCAGAGACAGAGGAGGGGGCGGCCGAGCTC 
CGGCTGCGCTGCCTGGGGTGGGGGCCAGGTCGCGGGGAGCTGAGCTGGAGCCGGGACGGA 
CGCGCCCTGGAGGCGGCGGAATCGGAGGGAGCCGAGACGCCCCGGATGCGCTCAGAGGGC 
GACCAGCTGCTCATCGTGCGCCCTGTGCGCAGCGACCACGCCCGGTACACTTGCCGCGTC 
CGCAGCCCCTTCGGCCACAGGGAGGCTGCCGCCGACGTCAGCGTCTTCTACGGCCCGGAC 
CCGCCGACCATCACGGTCTCCTCGGACCGCGACGCCGCGCCTGCCCGCTTTGTCACCGCG 
GGCAGTAACGTGACCTTGCGCTGCGCCGCCGCCTCGCGGCCGCCCGCCGACATCACGTGG 
AGCCTGGCGGACCCGGCCGAGGCCGCGGTGCCCGCGGGGTCGCGCCTCCTGCTGCCCGCG 
GTCGGACCGGGCCACGCAGGCACCTACGCCTGCCTGGCGGCGAACCCGCGTACCGGCCGC 
CGCCGCCGCTCGCTGCTCAACCTTACAGTGGCGGACCTGCCCCCCGGGGCCCCACAGTGC 
TCAGTTGAAGGGGGTCCCGGGGACCGCAGCCTCCGCTTCCGCTGCTCGTGGCCCGGCGGG 
GCCCCTGCTGCCTCCCTGCAGTTCCAGGGTCTCCCCGAAGGCATCCGCGCCGGGCCAGTG 
TCCTCTGTGCTGCTGGCGGCCGTCCCCGCCCACCCCCGGCTCAGCGGCGTCCCCATCACC 
TGCCTTGCTCGCCACCTGGTGGCCACGCGTACCTGCACAGTCACGCCGGAGGCCCCCCGA 
GAGGTGCTGCTGCATCCGCTGGTGGCAGAGACACGGTTGGGGGAGGCAGAGGTGGCACTG 
GAGGCCTCTGGTTGTCCCCCACCCTCACGGGCATCCTGGGCCCGGGAAGGGAGGCCCCTG 
GCTCCAGGAGGCGGGAGTCGCCTGCGGCTCAGTCAAGATGGGCGGAAACTCCACATCGGC 
AACTTCAGCCTGGATTGGGACCTGGGAAATTACTCCGTGCTGTGCAGTGGGGCGCTGGGT 
GCTGGCGGTGACCAGATCACCCTCATTGGACCCTCCATATCCTCGTGGAGGCTTCAGAGA 
GCCAGAGATGCAGCCGTGCTGACTTGGGATGTGGAGCGCGGGGCCCTGATCAGCAGTTTT 
GAGATCCAGGCATGGCCAGATGGGCCTGCTCTGGGCAGGACTTCCACCTACAGGGACTGG 
GTCTCCCTGCTCATCCTGGGGCCTCAGGAGCGGTCAGCCGTGGTGCCCCTTCCACCTCGG 
AACCCAGGGACCTGGACCTTTCGGATCCTGCCCATCCTGGGGGGCCAGCCAGGGACTCCA 
TCACAAAGCCGGGTCTACCGGGCCGGCCCCACGTTGAGCCATGGGGCCATCGCTGGCATC 
GTCCTGGGCTCCCTGCTGGGCCTGGCGCTGCTAGCCGTACTTCTCCTCCTTTGCATCTGC 
TGCCTGTGCCGCTTTCGTGGAAAGACTCCTGAGAAAAAGAAGCATCCTTCTACCTTGGTC 
CCCGTGGTCACCCCCTCAGAAAAGAAGATGCATAGTGTGACCCCAGTGGAGATTTCATGG 
CCTCTGGACCTCAAAGTCCCTCTGGAGGACCACAGCTCAACTAGGGCCTACCAAAAGAAG 
AGTCTTCCTGTATTTGTGCAAGTAAGGAGATGTGACTTCTTGGCTGGGAAACTCTTGCTG 
ATCTAA 
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In a search of public sequence databases, the N0V6 nucleic acid sequence, located on 
chromsome 19 has 941 of 1572 bases (59%) identical to a gb:GENBANK- 
ID:AF246990|acc:AF246990.1 mRNA from Chlamydomonas reinhardtii (Chlamydomonas 
reinhardtii flagellar autotomy protein Falp (FA1) mRNA, complete cds). Public nucleotide 
5 databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV6 polypeptide (SEQ ID NO: 14) encoded by SEQ ID NO: 13 has 
881 amino acid residues and is presented in Table 6B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV6 has a signal peptide and is likely 
to be localized at the plasma membrane with a certainty of 0.4600. The signal peptide is 
10 predicted by SignalP to be cleaved at amino acid 27-28. 



Table 6B. Encoded NOV6 protein sequence (SEQ ID NO: 14). 

MDNPQALPLFLLLASLVGILTLRASSGLQQTNFSSAFSSDSKSSSQGLGVEVPSIKPPSW 
KVPDQFLDSKAS AGI SDS S WFPEALS SNMSGS FWSNVS AEGQDLS PVS PFSETPGSEVFP 
DISDPQVPAKDPKPSFTVKTPASNISTQVSHTKLSVEAPDSKFSPDDMDLKLSAQSPESK 
FSAETHSAASFPQQVGGPLAVLVGTTIRLPLVPIPNPGPPTSLWWRRGSKVLAAGGLGP 
GAPLISLDPAHRDHLRFDQARGVLELASAQLDDAGVYTAEVIRAGVSQQTHEFTVGVYEP 
LPQLSVQPKAPETEEGAAELRLRCLGWGPGRGELSWSRDGRALEAAESEGAETPRMRSEG 
DQLL I VRP VRS DHAR YTCRVRS P FGHRE AAADVS VF YGPD PPT I TVS S DRDAAP ARFVT A 
GSWTLRCAAASRPPADITWSLADPAEAAVPAGSRLLLPAVGPGHAGTYACIAANPRTGR 
RRRS LLNLTVADLPPGAPQCS VEGGPGDRS LRFRCS WPGG AP AAS LQFQGLPEG I RAG P V 
SSVLLAAVPAHPRLSGVPITCLARHLVATRTCTVTPEAPREVLLHPLVAETRLGEAEVAL 
EASGCPPPSRASWAREGRPLAPGGGSRLRLSQDGRKLHIGNFSLDWDLGNYSVLCSGALG 
AGGDQ I TLIGP SIS S WRLQRARDAAVLTWDVERGALI S S FE I QAWPDGPALGRTSTYRDW 
VSLLILGPQERSAWPLPPRNPGTWTFRILPILGGQPGTPSQSRVYRAGPTLSHGAIAGI 
VLGS LLGLALLAVLLLLC I CCLCRFRGKTPEKKKHP S TLVP WTP SEKKMHS VTP VE I S W 
PLDLKVPLEDHSSTRAYQKKSLPVFVQVRRCDFLAGKLLLI 



A search of sequence databases reveals that the NOV6 amino acid sequence has 76 of 
254 amino acid residues (29%) identical to, and 117 of 254 amino acid residues (46%) similar 
15 to, the 521 amino acid residue ptnr:SPTREMBL-ACC:Q61352 protein from Mus musculus 
(Mouse) (BILIARY GLYCOPROTEIN 1 PRECURSOR). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV6 is expressed in at least lung, ovary, squamous cell carcinoma, and fibrotheoma. 
This information was derived by determining the tissue sources of the sequences that were 
20 included in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV6 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 6C. 
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Table 6C. BLAST results for NOV6 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi| 312 5 84 |emb|CAA4 7 
697. 1| (X67280) 


biliary 

glycoprotein [Mus 
musculus] 


458 


68/237 
(28%) 


109/237 
(45%) 


le-16 


qi | 14029256 | gb | AAK5 
2602 .1 | (AF287912) 


CEA-related cell 
adhesion molecule 
2 [Mus musculus] 


520 


68/237 
(28%) 


109/237 
(45%) 


3e-l6 


gi|423398|pir| |S343 
38 


biliary 

glycoprotein F - 
mouse 


521 


68/237 
(28%) 


109/237 
(45%) 


4e-16 


gi |483309|pir| | JC15 
09 


biliary 

glycoprotein E - 
mouse 


458 


68/237 
(28%) 


108/237 
(44%) 


5e-16 


gi|l09630|pir| |S116 
26 


carcinoembryonic 
antigen - mouse 
(fragment) 


379 


66/237 
(27%) 


106/237 
(43%) 


le-15 



Table 6D lists the domain descriptions from DOMAIN analysis results against NOV6. 
This indicates that the NOV6 sequence has properties similar to those of other proteins known 
to contain this domain. 



Table 6D. Domain Analysis of NOV6 

gnl | Smart | smar t 004 0 8 , IGc2, Immunoglobulin C-2 Type 

CD-Length = 63 residues, 93.7% aligned 

Score = 55.5 bits (132), Expect = le-08 



The disclosed NOV6 novel gene described here contains three immunoglobulin 
domains and has homology to mouse CD22, a B lymphocyte-restricted adhesion molecule, 
mouse colon biliary glycoprotein, and carcinoembryonic antigen. The immunoglobulin 
domain is found as a tandem repeat in Streptococcal cell surface proteins, such as the IgG 
binding proteins G and MIG. These proteins are type I membrane proteins that bind to the 
constant Fc region of IgG with high affinity. The N-terminus of MIG mediates binding to 
plasma proteinase inhibitor alpha 2-macroglobulin after complex formation with proteases. 

The human B lymphocyte-specific Ag, CD22, is a cell adhesion molecule expressed on 
the surface during a narrow window of B cell development, coincident with surface IgD. A 
ligand for CD22 has recently been identified on human T cells as the low molecular mass 
isoform of the leukocyte common Ag, CD45RO. CD22 has been reported to function in the 
regulation of both T and B cell activation in vitro. 

Carcinoembryonic antigen (CEA) is a widely used tumor marker, especially in the 
surveillance of colonic cancer patients. Although CEA is also present in some normal tissues, 
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it is apparently expressed at higher levels in tumorous tissues than in corresponding normal 
tissues. Carcinoembryonic antigen (CEA) expression is perhaps the most prevalent of 
phenotypic changes observed in human cancer cells. Twenty-seven CEA cDNA clones were 
isolated from a human colon adenocarcinoma cell line. Most of these clones are full length and 
5 consist of a number (usually three) of surprisingly similar long (534 base pairs) repeats 

between a 5' end of 520 base pairs and a 3' end with three different termination points. The 
predicted translation product of these clones consists of a processed signal sequence of 34 
amino acids, an amino-terminal sequence of 1 07 amino acids, which includes the known 
terminal amino acid sequence of CEA, three repeated domains of 178 amino acids each, and a 

10 membrane-anchoring domain of 27 amino acids, giving a total of 702 amino acids and a 
molecular weight of 72,813 for the mature protein. The repeated domains have conserved 
features, including the first 67 amino acids at their N termini and the presence of four cysteine 
residues. Comparisons with the amino acid sequences of other proteins reveals homology of 
the repeats with various members of the immunoglobulin supergene family, particularly the 

1 5 human T-cell receptor gamma chain. CEA cDNA clones in the SP-65 vector were shown to 
produce transcripts in vitro which could be translated in vitro to yield a protein of molecular 
weight 73,000 which in turn could be precipitated with CEA-specific antibodies (See Schrewe 
H et al, Mol Cell Biol 1990 Jun;10(6):2738-48.). 

The biliary glycoprotein (BGP)-encoding gene is a member of the human 
20 carcinoembryonic antigen (CEA) gene family. McCuaig et al. cloned several mouse Bgp 
cDNAs from an outbred CDR-1 mouse colon cDNA library, as well as by reverse 
transcription-PCR amplification of colon RNA. The distinguishing features of the deduced 
Bgp protein isoforms are found in the two divergent N-terminal domains, the highly conserved 
internal C2-set immunoglobulin domains, and an intracytoplasmic domain of either 10 or 73 
25 amino acids (aa). The cDNA structures suggest that these mRNAs are produced through 

alternative splicing of a Bgp gene and the usage of multiple transcriptional terminators. The 
Bgp deduced aa sequences are highly homologous to several well characterized rat hepatocyte 
proteins such as the cell CAM105/ecto-ATPase/ppl20/HA4 proteins. 
Oligodeoxyribonucleotide probes representing the various cDNA isoform domains revealed 
30 predominant transcripts of 1 .8, 3. 1 and 4.0 kb on Northern analyses of mouse colon RNA; 

some of these bands are actually composed of several co-migrating transcripts. The transcripts 
encoding the long intracytoplasmic-tailed Bgp proteins are expressed at one-tenth the relative 
abundance of the shorter-tailed species. The expression of the many Bgp isoforms at the 
surface of epithelial cells, such as colon, suggests that these proteins play a determinant role, 
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through self- or heterologous contact, in renewal and/or differentiation of their epithelia (See 
McCuaig et al., Gene 1993 May 30; 127(2): 173-83). 

The disclosed NOV6 nucleic acid of the invention encoding a CD22-like protein 
includes the nucleic acid whose sequence is provided in Table 6A or a fragment thereof The 
5 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 6 A while still encoding a protein that maintains 
its CD22-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 

10 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 

15 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 41 percent of the 
bases may be so changed. 

The disclosed NOV6 protein of the invention includes the CD22-like protein whose 
sequence is provided in Table 6B. The invention also includes a mutant or variant protein any 

20 of whose residues may be changed from the corresponding residue shown in Table 6B while 

still encoding a protein that maintains its CD22-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 71 percent of the 
residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 

25 (F a b)2 3 that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this CD22-like protein 
(NOV6) may function as a member of a "CD22 family". Therefore, the NOV6 nucleic acids 
and proteins identified here may be useful in potential therapeutic applications implicated in 
(but not limited to) various pathologies and disorders as indicated below. The potential 

30 therapeutic applications for this invention include, but are not limited to: protein therapeutic, 
small molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 
research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 
(but not limited to) those defined here. 




— • 

The N0V6 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the CD22-Iike protein 
(NOV6) may be useful in gene therapy, and the CD22-like protein (NOV6) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from endometriosis, fertility, systemic lupus erythematosus, autoimmune disease, asthma, 
emphysema, scleroderma, allergy, ARDS, or other pathologies or conditions. The NOV6 
nucleic acid encoding the CD22-like protein of the invention, or fragments thereof, may 
further be useful in diagnostic applications, wherein the presence or amount of the nucleic acid 
or the protein are to be assessed. 

NOV6 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV6 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV6 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV7 

A disclosed NOV7 nucleic acid of 8589 nucleotides (also referred to as CG57595-01) 
encoding a MEGF8-like protein is shown in Table 7A. Putative untranslated regions upstream 
and/or downstream from the coding region, if any, are underlined, and the start and stop 
codons are in bold letters. 



Table 7A. NOV7 nucleotide sequence (SEQ ID NO: 15). 

TGCAGGAGGCGGCGATGGCCCTGGGCAAGGTTCTGGCCATGGCACTGGTTTTGGCCTTGGCCGTGCTGGG 
GTCGCTGTCCCCTGGGGCCCGGGCGGGGGACTGCAAGGGGCAGCGGCAGGTGCTGCGGGAGGCGCCAGGC 
TTCGTGACGGATGGTGCGGGCAACTACAGCGTCAATGGCAACTGCGAGTGGCTCATCGAGGCCCCAAGCC 
CCCAGCACCGGATCCTGCTGGACTTCCTTTTCCTGGACACAGAGTGCACGTATGACTACCTGTTCGTGTA 
TGACGGTGACTCCCCGCGAGGGCCGCTGCTTGCCAGTCTAAGTGGGAGCACCCGACCTCCGCCCATCGAA 
GCTTCCTCAGGCAAGATGCTGCTGCACCTCTTCAGTGATGCCAACTACAACCTGCTGGGCTTTAACGCCT 
CATTCCGCTTCTCCCTGTGCCCGGGTGGCTGCCAGAGCCACGGGCAGTGCCAGCCACCGGGTGTGTGTGC 
CTGCGAGCCGGGCTGGGGGGGTCCTGACTGTGGCCTGCAGGAGTGCTCAGCCTACTGTGGCAGCCACGGC 
ACCTGCGCCTCGCCCCTGGGACCATGCCGCTGTGAGCCTGGCTTCTTGGGACGTGCCTGTGACCTGCACC 
TGTGGGAGAACCAGGGGGCTGGGTGGTGGCACAACGTGAGTGCCAGGGACCCTGCCTTCTCTGCCCGTAT 
TGGGGCAGCTGGCGCCTTCCTGTCCCCACCAGGGCTGCTGGCAGTTTTCGGAGGCCAGGACCTCAACAAT 
GCCCTGGGTGACCTCGTCCTATACAACTTCTCCGCCAACACCTGGGAGTCTTGGGACCTGAGTCCTGCCC 
CGGCTGCCCGTCACTCCCATGTGGCCGTGGCCTGGGCCGGCTCCCTGGTACTGATGGGTGGTGAGCTGGC 
TGACGGCTCGCTCACCAACGACGTGTGGGCCTTCAGTCCACTGGGCAGGGGCCACTGGGAGCTCCTGGCA 
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CCACCTGCCTCCAGCTCCTCGGGGCCCCCAGGCCTGGCAGGTCACGCGGCTGCCCTGGTGGATGATGTCT 
GGCTATATGTGTCTGGAGGCCGCACCCCGCACGACCTCTTCTCCTCTGGCCTCTTCCGTTTCCGCCTTGA 
CAGCACCAGCGGGGGCTATTGGGAGCAGGTGATTCCGGCAGGCGGACGGCCCCCTGCTGCCACTGGCCAC 
TCCATGGTGTTCCATGCCCCCTCCCGTGCCCTGCTGGTCCATGGTGGACACCGGCCCTCCACTGCCCGGT 
TCTCTGTGCGAGTGAACTCCACTGAGCTTTTCCACGTGGATCGGCATGTGTGGACGACGCTGAAGGGGCG 
GGATGGGCTTC AGGG C C CAAGGGAG CG AG C C T T C CACACAG CCAG TG T T C TGGG CAATT ACATGGTGG TC 
TATGGGGGCAATGTGCACACCCATTACCAGGAGGAAAAGTGCTACGAAGATGGCATCTTCTTCTACCACC 
TTGGCTGCCATCAATGGGTGTCAGGAGCTGAGCTTGCCCCGCCAGGAACCCCTGAGGGCCGAGCAGCGCC 
TCCCAGTGGTCGGTACTCACATGTAGCTGCGGTGCTTGGTGGCAGCGTCCTGTTGGTGGCTGGGGGGTAC 
AGCGGCCGGCCCCGTGGGGACTTGATGGCGTACAAGGTGCCCCCCTTTGTGTTCCAGGCACCTGCCCCTG 
ACTACCACTTGGACTACTGCTCCATGTACACAGACCACAGCGTCTGCTCCCGGGACCCGGAATGCAGTTG 
GTGCCAAGGAGCCTGCCAAGCTGCACCCCCTCCTGGGACCCCCTTGGGGGCTTGTCCAGCCGCCAGCTGC 
CTGGGCCTGGGCCGCCTCCTGGGTGACTGCCAGGCCTGCCTGGCCTTCAGCAGCCCCACAGCCCCTCCAC 
GGGGACCTGGCACCCTGGGCTGGTGCGTGCACAATGAGAGCTGCCTCCCTAGGCCTGAGCAGGCCCGCTG 
CCGAGGGGAGCAGATCTCAGGCACTGTGGGCTGGTGGGGGCCTGCGCCTGTCTTCGTCACGTCCCTGGAG 
GCCTGCGTCACCCAGAGCTTCCTGCCTGGCCTGCACTTGCTCACCTTTCAGCAGCCGCCCAATACCTCCC 
AGCCTGACAAGGTCTCAATTGTCCGCAGCACGACCATCACCCTAACAC C CAG CG CAGAGACAGATGTGTC 
CCTGGTCTACCGTGGCTTCATCTACCCAATGCTGCCTGGAGGGCCAGGTGGACCAGGGGCTGAGGACGTG 
GCCGTGTGGACGCGGGCCCAGCGCCTACACGTCCTGGCCCGGATGGCCCGTGGCCCTGACACGGAGAACA 
TGGAGGAGGTGGGGCGCTGGGTGGCTCATCAGGAGAAGGAGACGCGGCGGCTGCAGCGCCCTGGGTCTGC 
TCGCCTCTTCCCTCTGCCTGGGCGGGACCACAAGTATGCAGTAGAGATCCAGGGCCAGCTCAATGGCTCG 
GCAGGCCCTGGGCACAGCGAGCTAACTCTGCTGTGGGATCGGACTGGTGTGCCAGGAGGCAGCGAGATCT 
CCTTCTTCTTCCTGGAGCCCTACCGCTCGTCGTCCTGCACCTCCTATTCTTCCTGCCTGGGCTGCTTGGC 
AGACCAGGGCTGTGGCTGGTGCCTGACCAGTGCCACCTGCCACCTGCGCCAGGGCGGAGCCCATTGCGGG 
GATGACGGGGCTGGTGGGTCCCTGCTGGTGCTGGTGCCTACCCTCTGCCCACTCTGCGAGGAGCATCGGG 
ACTGCCACGCCTGCACCCAGGACCCCTTCTGTGAGTGGCATCAGAGCACCAGCCGCAAAGGGGACGCGGC 
ATGCAGCCGGCGGGGCCGGGGTCGGGGTGCCCTGAAGAGTCCAGAGGAGTGTCCCCCGCTCTGCAGCCAG 
CGACTGACCTGTGAGGACTGCCTGGCCAACTCTAGCCAGTGCGCCTGGTGCCAGTCCACCCACACCTGCT 
TCCTGTTTGCTGCCTACTTGGCCCGGTACCCACACGGGGGCTGTCGAGGCTGGGACGACAGTGTACACTC 
GGAGCCACGGTGCCGGAGCTGCGATGGCTTCCTGACCTGCCATGAGTGTCTGCAGAGCCACGAGTGTGGC 
TGGTGTGGCAATGAGGACAACCCCACACTGGGACGGTGCCTACAGGGGGACTTCTCAGGGCCCCTCGGTG 
GGGGTAACTGCTCCCTGTGGGTGGGGGAGGGCCTGGGGCTTCCCGTGGCCCTCCCTGCCCGCTGGGCATA 
CGCCCGCTGTCCTGACGTGGATGAGTGTCGCCTGGGCCTGGCCCGGTGCCACCCGCGGGCGACCTGCCTG 
AACACGCCCCTCAGCTACGAGTGTCACTGCCAGCGGGGCTACCAGGGTGATGGCATCTCACACTGCAACC 
GCACGGATGGCATCTCACACTGCAACCGCACGTGCTTGGAGGACTGTGGCCATGGTGTGTGCAGTGGCCC 
CCCGGACTTTACCTGCGTGTGTGACCTAGGCTGGACATCAGACCTGCCCCCTCCCACACCCGCCCCGGGT 
CCGCCAGCCCCCCGCTGCTCCCGGGACTGTGGCTGCAGCTTCCACAGCCACTGCCGCAAGCGGGGCCCTG 
GCTTCTGCGACGAGTGCCAGGACTGGACATGGGGGGAGCACTGCGAACGATGCCGGCCCGGCAGCTTCGG 
CAACGCCACAGGCTCTAGGGGCTGCCGGCCCTGCCAGTGCAACGGGCACGGGGACCCACGCCGTGGCCAC 
TGCGACAACCTCAGTGGGCTCTGCTTCTGCCAGGACCACACCGAGGGTGCCCACTGCCAGCTCTGCTCCC 
CAGGCTATTATGGGGATCCCCGGGCCGGTGGTTCCTGCTTTCGGGAGTGTGGGGGTCGCGCCCTCCTCAC 
CAACGTGTCCTCAGTGGCACTGGGCTCACGCCGGGTCGGGGGGCTGCTGCCTCCAGGTGGCGGGGCTGCA 
AGAGCCGGGCCTGGCCTGTCCTACTGTGTGTGGGTTGTCTCGGCCACTGAGGAGCTACAGCCCTGTGCTC 
CCGGGACCCTCTGTCCCCCACTCACCCTCACCTTCTCCCCCGACAGCAGCACCCCCTGCACGCTGAGCTA 
CGTCCTGGCGTTTGATGGATTCCCACGCTTCCTGGACACTGGTGTTGTCCAGTCGGACCGCAGCCTCATA 
GCTGCCTTCTGCGGCCAGCGACGGGACAGGCCCCTCACTGTTCAGGCCCTGTCTGGGCTGCTCGTGCTGC 
ACTGGGAGGCCAATGGCTCCTCATCCTGGGGCTTCAATGCTTCGGTGGGCTCTGCCCGCTGTGGGTCAGG 
GGGCCCCGGGAGCTGTCCCGTCCCCCAGGAATGCGTGCCCCAGGACGGTGCTGCAGGTGCGGGGCTCTGC 
CGATGTCCTCAGGGCTGGGCTGGCCCACACTGCCGCATGGCTCTGTGTCCTGAGAACTGCAATGCCCACA 
CTGGGGCAGGAACTTGTAACCAGAGCCTGGGTGTGTGCATCTGTGCCGAGGGCTTCGGGGGCCCCGACTG 
CGCCACCAAGCTGGATGGCGGGCAGCTGGTCTGGGAGACCCTCATGGACAGCCGCCTCTCAGCCGACACT 
GCCAGCCGCTTCCTGCACCGCCTGGGCCACACCATGGTGGATGGACCCGATGCCACCTTGTGGATGTTTG 
GGGGCCTGGGCCTGCCCCAGGGGCTGCTGGGAAACCTGTACAGGTACTCAGTGAGTGAGCGGCGGTGGAC 
ACAGATGCTGGCGGGAGCCGAGGACGGGGGCCCAGGCCCATCGCCCCGCTCCTTCCATGCAGCCGCATAT 
GTGCCCGCTGGCCGTGGTGCCATGTATCTGCTGGGGGGACTTACCGCTGGAGGCGTCACCCGTGATTTCT 
GGGTCCTCAACCTCACCACCCTGCAATGGCGGCAGGAGAAGGCCCCCCAGACCGTGGAGCTGCCAGCCGT 
TGCTGGTCACACCCTTACTGCCCGCCGAGGCCTGTCTCTGCTCCTGGTGGGCGGTTACTCCCCGGAAAAT 
GGCTTCAACCAGCAGCTGCTGGAGTACCAGCTGGCAACCGGCACCTGGGTGTCAGGAGCCCAGAGTGGGA 
CACCCCCCACAGGTCTCTATGGTCACTCTGCTGTCTACCACGAGGCCACCGACTCCCTCTACGTGTTTGG 
GGGGTTCCGATTCCATGTGGAGCTGGCGGCCCCATCCCCCGAGCTCTACTCCCTGCACTGTCCTGACCGC 
ACCTGGAGTCTGCTGGCCCCTTCTCAGGGGGCAAAGCGAGATCGTATGAGGAATGTGCGTGGCTCATCTC 
GGGGTCTGGGCCAAGTTCCTGGGGAGCAGCCTGGGTCATGGGGGTTCCGGGAAGTCAGGAAGAAGATGGC 
TCTGTGGGCTGCTCTTGCTGGTACAGGAGGTTTCCTGGAGGAAATCTCACCTCACCTGAAGGAGCCCCGC 
CCCCGGCTTTTCCACGCCTCAGCCCTGTTAGGGGACACCATGGTGGTTCTTGGGGGGCGCTCGGACCCTG 
ACGAGTTCAGCAGCGACGTTCTGCTCTACCAGGTCAACTGCAATGCCTGGCTTCTGCCCGACCTCACCCG 
CTCGGCCTCTGTGGGGCCCCCAATGGAGGAGTCTGTGGCCCATGCTGTGGCAGCAGTCGGGAGCCGCCTG 
TATATCTCTGGGGGTTTCGGGGGAGTGGCCCTGGGCCGCCTGCTGGCACTGACCCTGCCCCCTGACCCCT 
GCCGCCTGCTGTCCTCACCTGAAGCTTGTAACCAGTCTGGGGCCTGCACCTGGTGCCATGGGGCCTGCTT 
GTCCGGGGATCAGGCCCACAGGCTGGGCTGCGGGGGCTCCCCCTGCTCCCCAATGCCTCGCTCCCCGGAG 
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G AATGT CG ACG T C T C CGG AC C TG C AG TG AG TG C CTGG C C CG C CAT C CT CGG AC CC TGC AAC CTGG AG ATG 
GAGAGGCGTCCACCCCCCGCTGTAAGTGGTGTACCAACTGCCCCGAAGGTGCTTGCATTGGACGCAATGG 
GTCCTGCACCTCTGAGAATGACTGTCGGATCAACCAGCGAGAGGTCTTCTGGGCAGGGAACTGCTCCGAG 
GCTGCGTGCGGGGCTGCTGACTGCGAGCAGTGCACGCGGGAGGGCAAGTGCATGTGGACGCGGCAGTTCA 
AGAGGACAGGGGAGACCCGCCGCATCCTCTCCGTGCAGCCCACCTATGACTGGACGTGCTTCAGCCACTC 
TCTGCTGAATGTGTCCCCCATGCCGGTGGAATCATCACCCCCACTGCCCTGCCCCACCCCTTGTCACCTC 
CTACCCAACTGTACCTCCTGCCTGGACTCTAAGGGAGCAGATGGGGGCTGGCAGCACTGTGTTTGGAGCA 
GCAGCCTGCAGCAGTGTCTGAGCCCTTCCTACCTGCCCCTGCGATGTATGGCCGGAGGCTGTGGGCGGCT 
GCTCCGGGGACCTGAGAGCTGCTCCCTGGGCTGTGCTCAGGCAACTCAGTGCGCCTTGTGCCTGCGGCGC 
CCCCATTGCGGCTGGTGTGCCTGGGGGGGCCAGGATGGGGGTGGCCGCTGCATGGAGGGTGGACTCAGCG 
GCCCCCGTGATGGGCTGACATGTGGGCGTCCGGGGGCCTCCTGGGCCTTCCTGTCCTGCCCCCCTGAGGA 
CGAGTGTGCAAACGGGCACCACGACTGCAACGAGACGCAGAATTGCCACGACCAGCCCCACGGCTATGAG 
TGCAGCTGCAAGACCGGCTATACCATGGACAACATGACAGGGCTGTGCCGCCCTGTGTGCGCCCAGGGCT 
GCGTGAACGGCTCATGTGTGGAGCCCGACCACTGCCGCTGCCACTTTGGCTTTGTGGGCCGCAACTGCTC 
CACGGAATGCCGCTGCAACCGCCACAGTGAATGCGCTGGTGTTGGGGCGCGTGACCACTGCTTGCTCTGC 
CGCAACCACACCAAGGGCAGCCACTGTGAGCAGTGCCTCCCGCTGTTTGTGGGTTCAGCTGTCGGAGGCG 
GGACCTGCCGGCCCTGCCACGCCTTTTGTCGTGGAAATAGCCACATCTGCATCTCCAGGAAGGAGTTACA 
AATGT C CAAGGGAGAG CC AAAGAAGTACT C ACTGG AC C CAGAGGAGATTGAAAACTGGGTGACAGAGGGT 
CCTAGTGAAGACGAGGCCGTGTGCGTGAACTGCCAGAATAACAGCTATGGGGAGAAATGCGAGAGCTGCC 
TGCAGGGCTACTTCCTCCTGGACGGGAAGTGCACCAAATGCCAGTGTAATGGCCACGCGGACACATGTAA 
CGAGCAGGATGGGACGGGCTGTCCATGTCAGAATAACACAGAGACGGGCACATGCCAGGGCAGCTCCCCC 
AGTGACCGTCGAGACTGCTACAAGTACCAGTGCGCCAAGTGCCGGGAATCATTTCACGGGAGTCCGCTGG 
GCGGCCAGCAGTGCTACCGCCTCATCTCGGTGGAGCAGGAGTGCTGCCTGGACCCCACGTCCCAGACCAA 
CTGCTTCCATGAGCCCAAACGCCGGGCGCTAGGCCCCGGCCGCACTGTCCTCTTTGGCGTGCAGCCCAAA 
TTCACCAACGTGGACATCCGCCTGACGCTGGACGTGACCTTCGGGGCCGTGGACCTCTATGTCTCCACCT 
CCTATGACACCTTCGTGGTCCGTGTGGCCCCTGACACTGGCGTCCATACTGTACACATCCAGCCACCCCC 
AGCCCCACCACCTCCACCACCCCCTGCAGATGGTGGGCCCCGGGGGGCTGGGGATCCAGGAGGAGCAGGG 
GCCAGCAGTGGGCCGGGCGCCCCAGCAGAGCCACGGGTACGGGAGGTATGGCCGCGGGGCCTGATTACCT 
ACGTGACGGTGACGGAGCCGTCGGCAGTGCTGGTGGTCCGCGGCGTGCGGGACCGGCTGGTCATCACCTA 
CCCACACGAGCACCATGCCCTCAAGTCGAGCCGCTTCTACCTGCTGCTGCTGGGCGTGGGAGACCCAAGT 
GGGCCCGGCGCCAACGGCTCAGCCGACTCGCAGGGCCTGCTCTTCTTCCGGCAGGACCAGGCCCACATTG 
ACCTGTTTGTCTTCTTCTCCGTCTTCTTCTCCTGCTTCTTCCTCTTCCTCTCACTCTGTGTGCTCCTCTG 
GAAGGCCAAGCAGGCTCTGGACCAGCGGCAGGAGCAGCGCCGGCACTTGCAGGAGATGACCAAGATGGCC 
AGCCGCCCCTTCGCCAAGGTCACCGTCTGCTTCCCACCTGACCCTACTGCCCCGGCCTCCGCCTGGAAGC 
CGGCTGGGCTCCCACCTCCCGCCTTCCGCCGCTCTGAGCCCTTCCTGGCACCCCTGCTGCTGACAGGGGC 
CGGTGGGCCCTGGGGACCCATGGGAGGGGGCTGCTGCCCACCAGCCATCCCCGCCACCACTGCTGGGCTG 
CGAGCTGGGCCCATCACTCTCGAGCCCACAGAAGATGGCATGGCTGGCGTGGCCACACTGCTGCTCCAGC 
TGCCTGGCGGGCCCCATGCACCCAACGGCGCCTGCCTGGGGTCAGCCCTCGTCACACTGCGGCACAGGCT 
GCACGAGTACTGTGGGGGTGGTGGGGGTGCTGGGGGCAGTGGGCATGGGACTGGTGCGGGCCGGAAGGGA 
CTGTTGAG CCAGGACAACCT CAC CAG CATGT C C C TCTGACATG C CCAGG 



In a search of public sequence databases, the NOV7 nucleic acid sequence, located on 
chromsome 19 has 5224 of 5224 bases (100%) identical to a gbrGENBANK- 
ID:AB01 1 54 l|acc: ABO 1 1541.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
MEGF8, partial cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV7 polypeptide (SEQ ID NO: 16) encoded by SEQ ID NO: 15 has 
2854 amino acid residues and is presented in Table 7B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV7 has a signal peptide and is likely 
to be localized at the plasma membrane with a certainty of 0.4600. The most likely cleavage 
site for a NOV7 peptide is between amino acids 27 and 28. 



Table 7B. Encoded NOV7 protein sequence (SEQ ID NO:16). 

MALGKVLAMALVLA 

IEAPSPQHRILLDFLFLDTECTYDYLFVYDGDSPRGPLLASLSGSTRPPPIEASSGKMLL 
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HLFSDANYNLLGFNASFRFSLCPGGCQSHGQCQPPGVCACEPGWGGPDCGLQECSAYCGS 

HGTCASPLGPCRCEPGFLGRACDLHLWENQGAGWWHNVSARDPAFSARIGAAGAFLSPPG 

LLAVFGGQDLNNALGDLVLYNFS ANTWES WDLS PAPAARHSHVAVAWAGS LVLMGGELAD 

GSLTNDVWAFSPLGRGHWELLAPPAS S S SGPPGLAGHAAALVDDVWLYVS GGRTPHDLFS 

SGLFRFRLDSTSGGYWEQVIPAGGRPPAATGHSMVFHAPSRALLVHGGHRPSTARFSVRV 

NSTELFHVDRHVWTTLKGRDGLQGPRERAFHTASVLGNYIWVYGGNVHTHYQE 

IFFYHLGCHQWVSGAELAPPGTPEGRAAPPSGRYSHVAAVLGGSVLLVAGGYSGRPRGDL 

MAYKVPPFVFQAPAPDYHLDYCSMYTDHSVCSRDPECSWCQGACQAAPPPGTPLGACPAA 

SCLGLGRLLGDCQACLAFSSPTAPPRGPGTLGWCVHNESCLPRPEQARCRGEQISGTVGW 

WGPAPVFVTSLEACVTQSFLPGLHLLTFQQPPNTSQPDKVSIVRSTTITLTPSAETDVSL 

VYRGFIYPMLPGGPGGPGAEDVAVWTRAQRLHVLARMARGPDTENMEEVGRWVAHQEKET 

RRLQRPGS ARLFPLPGRDHKYAVE I QGQLNGS AGPGHSELTLLWDRTGVPGGSE I S FFFL 

EPYRSSSCTSYSSCLGCLADQGCGWCLTSATCHLRQGGAHCGDDGAGGSLLVLVPTLCPL 

CEEHRDCHACTQDPFCEWHQSTSRKGDAACSRRGRGRGALKSPEECPPLCSQRLTCEDCL 

ANSSQCAWCQSTHTCFLFAAYLARYPHGGCRGWDDSVHSEPRCRSCDGFLTCHECLQSHE 

CGWCGNEDNPTLGRCLQGDFSGPLGGGNCSLWVGEGLGLPVALPARWAYARCPDVDECRL 

GLARCHPRATCLNTPLSYECHCQRGYQGDGISHCNRTDGISHCNRTCLEDCGHGVCSGPP 

D FTC VCDLGWTS D LP P P T P APG P PAP RC S RD CG C S FHS HCRKRG PG FCDE CQD WT WG EHC 

ERCRPGSFGNATGSRGCRPCQCNGHGDPRRGHCDNLSGLCFCQDHTEGAHCQLCSPGYYG 

DPRAGGSCFRECGGRALLTNVSSVALGSRRVGGLLPPGGGAARAGPGLSYCWJWSATEE 

LQPCAPGTLCPPLTLTFSPDSSTPCTLSYVLAFDGFPRFLDTGWQSDRSLIAAFCGQRR 

DRPLTVQALSGLLVLHWEANGSSSWGFNASVGSARCGSGGPGSCPVPQECVPQDGAAGAG 

LCRCPQGWAGPHCRMALCPENCNAHTGAGTCNQSLGVCICAEGFGGPDCATKLDGGQLVW 

ETLMDSRLSADTASRFLHRLGHTMVDGPDATLWMFGGLGLPQGLLGNLYRYSVSERRWTQ 

MLAGAEDGGPGPS PRS FHAAAYVPAGRGAMYLLGGLTAGGVTRDFWVLNLTTLQWRQEKA 

PQTVE L P AVAGHTLT ARRGLS LLLVGG YS P ENG FNQQLLE YQLATGTWVS G AQS GT P P TG 

L YGHS AVYHEATDS L YVFGG FRFHVELAAP S PEL YS LHCPDRTWS LLAP S QGAKRDRMRN 

VRGS SRGLGQVPGEQPGSWGFREVRKJ<MALWAALiAGTGGFLEE I S PHLKEPRPRLFHAS A 

LLGDTMWLGGRSDPDEFSSDVLLYQVNCNAWLLPDLTRSASVGPPMEESVAHAVAAVGS 

RLYI SGGFGGVALGRLLALTLPPDPCRLLS S PEACNQSGACTWCHGACLSGDQAHRLGCG 

GS PCS PMPRS PEECRRLRTCS E CLARHPRTLQPGDGEAS TPRCKWCTNCPEGAC IGRNGS 

CTSENDCRINQREVFWAGNCSEAACGAADCEQCTREGKCMWTRQFKRTGETRRILSVQPT 

YDWTCFSHSLLNVSPMPVESSPPLPCPTPCHLLPNCTSCLDSKGADGGWQHCVWSSSLQQ 

CLSPSYLPLRCMAGGCGRLLRGPESCSLGCAQATQCALCLRRPHCGWCAWGGQDGGGRCM 

EGGLSGPRDGLTCGRPGASWAFLSCPPEDECANGHHDCNETQNCHDQPHGYECSCKTGYT 

MDNMTGLCRPVCAQGCVNGSCVEPDHCRCHFGFVGRNCSTECRCNRHSECAGVGARDHCL 

LCRNHTKGSHCEQCLPLFVGSAVGGGTCRPCHAFCRGNSHICISRKELQMSKGEPKKYSL 

DPEEIENWVTEGPSEDEAVCVNCQNNSYGEKCESCLQGYFLLDGKCTKCQCNGHADTCNE 

QDGTGCPCQNNTETGTCQGS S PSDRRDCYKYQCAKCRES FHGS PLGGQQCYRL I SVEQEC 

CLDPTSQTNCFHEPKRRALGPGRTVLFGVQPKFTNVDIRLTLDVTFGAVDLYVSTSYDTF 

WRVAPDTGVHTVH IQ P P PAP P P P P P P ADGG PRGAGD PGG AG AS SGPGAP AEPRVRE VWP 

RGLITYVTVTEPSAVLWRGVRDRLVITYPHEHHALKSSRFYLLLLGVGDPSGPGANGSA 

DSQGLLFFRQDQAHIDLFVFFSVFFSCFFLFLSLCVLLWKAKQALDQRQEQRRHLQEMTK 

MASRPFAKVTVCFPPDPTAPASAWKPAGLPPPAFRRSEPFLAPLLLTGAGGPWGPMGGGC 

CPPAIPATTAGLRAGPITLEPTEDGMAGVATLLLQLPGGPHAPNGACLGSALVTLRHRLH 

E YCGGGGG AGGS GHGTGAGRKGLLS QDNLT SMS L 



A search of sequence databases reveals that the N0V7 amino acid sequence has 1737 
of 1737 amino acid residues (100%) identical to, and 1737 of 1737 amino acid residues 
(100%) similar to, the 1737 amino acid residue ptnr:SPTREMBL-ACC:O75097 protein from 
Homo sapiens (Human) (MEGF8). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV7 is expressed in at least kidney, nervous system, brain, lung. This information 
was derived by determining the tissue sources of the sequences that were included in the 
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invention including but not limited to SeqCalling sources, Public EST sources, Literature 
sources, and/or RACE sources. 

The disclosed NOV7 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 7C. 



Table 7C. BLAST results for NOV7 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 7513144 |pir| |T0O 
209 


MEGF8 protein - 
human ( f ragmen t ) 


1737 


1737/1737 
(100%) 


1737/1737 
(100%) 


0.0 


gi | 14756108 | ref |XP 
029883. 1| 
<XM_029883) 


EGF-like- 
domain, multiple 4 
[Homo sapiens] 


1214 


1171/1173 
(99%) 


1171/1173 
(99%) 


0.0 


gi | 6681364 | dbj |BAA8 
8689. 1| (AB011534 


MEGF8 [Rattus 
norvegicus] 


874 


836/874 
(95%) 


850/874 
(96%) 


0.0 


gi | 10728654 | gb [ AAF5 
2597.2| (AE003619) 


CG74 66 gene 
product 
[Drosophila 
melanogaster] 


2820 


786/2379 
(33%) 


1107/2379 
(46%) 


0.0 


gi | 17862106 | gb | AAL3 
9530. 1| (AY069385) 


LD09511p 
[Drosophila 
melanogaster] 


779 


192/466 
(41%) 


263/466 
(56%) 


3e-94 



Tables 7D-F list the domain descriptions from DOMAIN analysis results against 
NOV7. This indicates that the NOV7 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 7D. Domain Analysis of NOV7 

gnl | Sm art j smar t 00 04 2 , CUB, Domain first found in Clr, Cls, uEGF, and 
bone morphogenetic protein.; This domain is found mostly among 
developmentally- regulated proteins. Spermadhesins contain only this 
domain. 

CD-Length = 114 residues, 82.5% aligned 

Score = 84.0 bits (206), Expect = le-16 
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Table 7E. Domain Analysis of NOV7 

gnl I Pf am| pf am00053 , laminin_EGF / Laminin EGF-like (Domains III and V) . 
This family is like pfam00008 but has 8 conserved cysteines instead of 
6. 

CD-Length = 49 residues, 87.8% aligned 

Score -- 47.4 bits (111), Expect = le-05 



Table 7F. Domain Analysis of NOV7 

gnl 1 Smart | smart 0 0179 , EGF_CA, Calcium-binding EGF-like domain 

CD-Length = 41 residues, 90.2% aligned 

Score = 38.5 bits (88), Expect = 0.005 



The domain that characterizes epidermal growth factor (EGF) consists of 
5 approximately 50 amino acids, and has been shown to be present in a more or less conserved 
form in a large number of other, mostly animal proteins. EGF-like domains are believed to 
play a critical role in a number of extracellular events, including cell adhesion and receptor- 
ligand interactions. Proteins with EGF-like domains often consist of more than 1,000 amino 
acids, have multiple copies of the EGF-like domain, and contain additional domains known to 

10 be involved in specific protein-protein interactions. The list of proteins currently known to 
contain one or more copies of an EGF-like pattern is large and varied. The functional 
significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, 
a common feature is that these repeats are found in the extracellular domain of membrane- 
bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). 

1 5 The EGF domain includes six cysteine residues which have been shown (in EGF) to be 

involved in 3 disulfide bonds. The main structure is a two-stranded beta-sheet followed by a 
loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines 
vary in length. 

To identify proteins containing EGF-like domains, Nakayama et al. (1998) searched a 
20 database of long cDNA sequences randomly selected from a human brain cDNA library for 
those that encode an EGF-like motif. They identified several partial cDNAs encoding novel 
proteins with multiple EGF-like domains, such as EGFL4, which they named MEGF8. The 
predicted partial EGFL4 protein has a laminin-type EGF-like domain, 5 EGF-like domains, 
and a transmembrane domain. Using a radiation hybrid mapping panel, Nakayama et al. 
25 (1998) mapped the EGFL4 gene to 19ql2. 
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The disclosed NOV7 nucleic acid of the invention encoding a MEGF8-like protein 
includes the nucleic acid whose sequence is provided in Table 7A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 7A while still encoding a protein that maintains 
5 its MEGF8-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
10 include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 0 percent of the 
1 5 bases may be so changed. 

The disclosed NOV7 protein of the invention includes the MEGF8-like protein whose 
sequence is provided in Table 7B. The invention also includes a mutant or variant protein any 
of whose residues may be changed from the corresponding residue shown in Table 7B while 
still encoding a protein that maintains its MEGF8-like activities and physiological functions, 
20 or a functional fragment thereof. In the mutant or variant protein, up to about 0 percent of the 
residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MEGF8-like 
25 protein (NOV7) may function as a member of a "MEGF8 family". Therefore, the NOV7 

nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
30 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV7 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
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and disorders as indicated below. For example, a cDNA encoding the MEGF8-like protein 
(NOV7) may be useful in gene therapy, and the MEGF8-like protein (NOV7) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
5 from Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch- 
Nyhan syndrome, multiple sclerosis, ataxia telangiectasia, leukodystrophies, behavioral 
disorders, addiction, anxiety, pain, neuroprotection, multiple sclerosis, myasthenia gravis, 
systemic lupus erythematosus, autoimmune disease, asthma, emphysema, scleroderma, 

10 allergy, ARDS, diabetes, renal artery stenosis, interstitial nephritis, glomerulonephritis, 

polycystic kidney disease, renal tubular acidosis, IgA nephropathy, or other pathologies or 
conditions. The NOV7 nucleic acid encoding the MEGF8-like protein of the invention, or 
fragments thereof, may further be useful in diagnostic applications, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. 

1 5 NOV7 nucleic acids and polypeptides are further useful in the generation of antibodies 

that bind immuno-specifically to the novel NOV7 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV7 proteins have multiple hydrophilic regions, each of 

20 which can be used as an immunogen. These novel proteins can be used in assay systems for 

functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV8 

NOV8 includes two protocadherin-like proteins disclosed below. The disclosed 
25 sequences have been named NOV8a and NOV8b. 

NOV8a 

A disclosed NOV8a nucleic acid of 6006 nucleotides (also referred to as CG57542-01) 
encoding a protocadherin-like protein is shown in Table 8A. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
30 stop codons are in bold letters. 
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Table 8A. NOV8a nucleotide sequence (SEQ ID NO:17). 

CAAATGTTTCTTTCTACCTTTTCTTATTTCAGATCAGCTTTGAGTGAACTTTGACAG AAG 
ATGTTTCGACAGTTTTATCTCTGGACATGTTTAGCTTCAGGGATCATCCTGGGCTCTCTC 
TTTGAAATCTGCTTGGGCCAGTATGATGATGGTAAGGATTGCAAACTAGCTAGGGGAGGA 
CCACCAGCTACCATAGTTGCTATTGATGAAGAAAGTCGGAATGGTGCAGGTACAATTCTG 
GTGGACAACATGCTGATCAAAGGGACTGCTGGAGGACCAGACCCCACCATAGAACTTTCT 
TTAAAGGATAATGTGGATTACTGGGTGTTGATGGATCCTGTTAAGCAAATGCTTTTCCTG 
AACAGCACCGGAAGAGTTCTGGATAGAGATCCACCGATGAACATACACTCCATTGTGGTG 
CAGGTCCAGTGCATCAACAAAAAAGTGGGCACTATTATCTACCATGAAGTGCGAATAGTG 
GTGAGAGACAGGAATGACAACTCACCCACTTTCAAGCATGAAAGCTACTATGCCACAGTG 
AATGAGCTCACTCCAGTTGGTACCACAATATTCACAGGATTTTCAGGAGACAATGGAGCT 
ACAGATATAGATGATGGACCAAATGGACAGATAGAGTATGTTATTCAGTATAATCCAGAT 
GATCCGACATCCAATGACACCTTTGAAATTCCCCTAATGTTGACTGGAAATATAGTGTTA 
AGGAAGAGGCTCAACTATGAAGATAAGACTCGCTACTTTGTCATAATCCAAGCTAATGAC 
CGTGCCCAAAATCTGAATGAGAGGCGAACCACCACCACCACTCTCACAGTGGATGTTCTG 
GATGGAGATGACTTGGGTCCAATGTTTCTTCCTTGTGTCCTTGTGCCAAACACTCGTGAT 
TGCCGTCCACTCACTTATCAAGCTGCCATACCTGAGTTGAGAACTCCGGAAGAACTGAAC 
CCCATTATTGTTACGCCACCAATCCAAGCCATTGATCAGGACCGGAATATTCAACCGCCA 
TCAGATAGGCCAGGAATCCTCTATTCCATCCTTGTTGGTGGGACTCCTGAGGATTACCCA 
CGATTTTTCCATATGCATCCTAGGACAGCAGAACTTAGTCTCCTGGAGCCAGTAAACAGA 
GACTTTCACCAGAAATTTGATTTGGTTATTAAGGCTGAACAAGACAATGGTCATCCTCTT 
CCTGCCTTTGCCGGTCTACACATTGAAATACTGGATGAAAACAATCAAAGTCCATATTTT 
ACAATGCCCAGTTATCAAGGCTATATCCTGGAATCTGCCCCAGTGGGAGCAACCATTTCG 
GACAGTCTCAATTTGACCTCACCTTTAAGAATAGTAGCTCTGGACAAGGACATAGAAGAT 
ACAAAAGACCCAGAGCTTCACCTTTTTCTGAATGACTACACCTCAGTCTTCACCGTCACA 
CAGACTGGTATTACTCGCTACCTCACCTTACTTCAACCAGTGGACAGGGAAGAACAGCAA 
ACTTACACCTTTTCGATAACAGCATTTGATGGTGTACAAGAAAGTGAGCCAGTCATCGTC 
AATATTCAAGTGATGGATGCAAATGATAACACGCCAACCTTCCCTGAAATATCCTATGAT 
GTGTATGTTTATACAGACATGAGACCTGGGGACAGTGTCATACAGCTCACTGCAGTCGAC 
GCAGACGAAGGGTCAAATGGGGAGATCACATATGAAATCCTTGTTGGGGCTCAGGGAGAC 
TTCATCATCAATAAAACAACAGGGCTTATCACCATCGCTCCAGGGGTGGAAATGATAGTC 
GGGCGGACTTACGCACTCACGGTCCAAGCAGCGGATAATGCTCCTCCTGCAGAGCGAAGG 
AACTCCATCTGCACTGTGTATATTGAAGTGCTTCCACCAAATAATCAAAGCCCTCCTCGC 
TTCCCACAGCTGATGTATAGCCTTGAAATTAGTGAAGCCATGAGGGTTGGTGCTGTTTTA 
TTAAATCTACAGGCAACTGATCGAGAGGGAGACTCAATAACATATGCCATTGAGAATGGA 
GATCCTCAGAGAGTTTTTAATCTTTCAGAAACCACGGGGATTCTAACCTTAGGGAAAGCA 
CTGGACAGGGAAAGCACTGATCGCTACATTCTGATCATCACAGCTTCAGATGGCAGGCCA 
GATGGGACCTCAACTGCCACAGTAAACATAATGGTGACAGATGTCAATGACAATGCTCCA 
GTGTTTGATCCTTATCTGCCAAGAAATTTATCTGTGGTGGAAGAAGAAGCCAATGCCTTT 
GTGGGTCAAGTAAAAGCAACAGACCCTGATGCTGGAATAAATGGTCAAGTGCACTACAGT 
TTGGGTAACTTTAATAATCTTTTTCGTATCACATCCAATGGGAGCATTTACACAGCAGTG 
AAGCTTAACAGAGAAGTCAGGGACTACTATGAACTTGTTGTTGTGGCAACAGATGGAGCA 
GTACACCCTCGTCATTCAACTCTAACCTTGGCCATCAAGGTTTTGGACATTGATGATAAC 
AGTCCTGTGTTCACCAATTCAACATACACTGTCCTTGTTGAAGAGAATTTGCCAGCTGGG 
ACTACCATCCTTCAAATAGAGGCCAAAGATGTCGACCTTGGAGCAAATGTGTCTTACCGG 
ATAAGAAGCCCAGAAGTGAAGCACTTTTTTGCACTACATCCATTTACAGGAGAACTATCG 
CTTTTAAGGAGTTTAGATTATGAGGCATTTCCAGACCAAGAAGCAAGTATCACTTTTCTG 
GTAGAGGCCTTTGATATTTATGGAACAATGCCACCTGGTATTGCTACTGTCACAGTGATT 
GTAAAGGATATGAATGATTATCCTCCTGTCTTTAGTAAACGAATATACAAAGGGATGGTG 
GCT C CGGATG C AGT CAAGGGTAC AC CTAT CAC AAC AGTTTATGCTGAAG ATG C AGAC CCT 
CCTGGATTACCTGCAAGTCGTGTGAGGTATAGAGTAGATGATGTACAGTTTCCTTACCCT 
GCCAGTATTTTTGAAGTGGAAGAAGATTCTGGAAGAGTAATAACACGAGTCAATCTTAAT 
GAAGAACCTACAACAATTTTTAAGTTGGTGGTGGTTGCTTTTGATGATGGGGAGCCTGTG 
ATGTCCAGCAGTGCCACAGTGAAGATTCTTGTCTTACATCCTGGTGAGATCCCACGCTTC 
ACACAGGAGGAATATAGGCCTCCTCCAGTAAGTGAACTTGCCACCAAAGGGACCATGGTT 
GGTGTAATTTCTGCTGCTGCCATTAATCAAAGTATTGTGTACTCCATTGTTTCAGGAAAT 
GAAGAAGATACATTTGGAATTAATAACATCACAGGTGTTATCTATGTGAATGGACCTCTG 
GATTATGAGACCAGGACAAGCTATGTACTTCGAGTCCAAGCTGATTCCCTGGAAGTGGTC 
CTTGCCAATCTCCGAGTTCCTTCAAAAAGTAATACAGCTAAAGTATACATTGAGATTCAG 
GATGAAAATAATCATCCCCCAGTGTTTCAGAAAAAATTCTACATCGGAGGTGTATCTGAA 
GATGCAAGAATGTTTACTTCTGTACTCAGAGTGAAGGCTACTGATAAAGATACTGGCAAT 
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TATAGTGTCATGGCCTACAGACTCATAATACCACCAATTAAAGAGGGAAAAGAAGGATTT 

GTAGTGGAAACATATACAGGGCTTATCAAAACTGCTATGCTCTTCCATAATATGAGGAGA 

TCCTACTTCAAGTTTCAAGTTATTGCAACTGACGACTATGGGAAGGGACTGAGCGGCAAA 

GCCGATGTACTCGTAAGTGTCTCCGTGGTCAATCAGCTGGATATGCAAGTCATTGTTTCC 

AATGTGCCTCCTACTCTAGTGGAAAAAAAGATAGAAGATCTTACAGAAATCTTGGATCGC 

TATGTTCAGGAACAAATTCCTGGTGCCAAGGTCGTAGTGGAGTCCATTGGAGCTCGCCGG 

CATGGAGATGCCTTTTCCCTAGAAGATTACACCAAATGTGACTTGACTGTCTATGCAATT 

GACCCCCAAACCAACAGAGCCATCGATAGAAATGAGCTTTTTAAGTTTTTGGATGGCAAA 

CTACTTGATATCAATAAAGACTTTCAGCCGTATTATGGGGAAGGAGGACGCATTCTGGAG 

ATC CGGACTC C AGAGG CAGTGACC AGC ATTAAAAAGAGAGGAGAAAGTC TAGGAT AC ACA 

GAAGGGGCCTTGTTGGCTCTGGCCTTCATCATCATCCTCTGCTGCATTCCTGCCATCTTG 

GTGGTTTTGGTCAGCTACAGACAGTTTAAAGTACGTCAAGCTGAGTGTACACGTCAAGCT 

GAGTGTACAAAGACTGCACGAATTCAGGCCGCATTACCCGCGGCTAAACCAGCAGTGCCG 

GCTCCTGCACCAGTGGCAGCGCCCCCGCCGCCGCCGCCGCCTCCGCCAGGTGCGCATCTC 

TATGAAGAACTTGGAGACAGCTCAATGTCTTTTCTTTCAAGTCTTTTCCTTCTCTACCAT 

TTTCAACAAAGCAGGGGAAATAACTCAGTCTCAGAAGACAGGAAACATCAACAAGTTGTG 

ATGCCCTTTTCTTCCAATACTATTGAGGCTCACAAGTCAGCTCATGTAGACGGATCACTT 

AAGAGCAACAAACTGAAGTCTGCAAGAAAATTCACATTTCTATCTGATGAGGATGACTTA 

AGTGCCCATAATCCCCTTTATAAGGAAAACATAAGTCAAGTATCAACAAATTCAGACATT 

TCACAGAGAACAGATTTTGTAGACCCATTTTCACCCAAAATACAAGCCAAGAGTAAGTCT 

CTGAGGGGCCCAAGAGAAAAGATTCAGAGGCTGTGGAGTCAGTCAGTCAGCTTACCCAGG 

AGGCTGATGAGGAAAGTTCCAAATAGACCAGAGATCATAGATCTGCAGCAGTGGCAAGGC 

ACCAGGCAGAAAGCTGAAAATGAAAACACTGGAATCTGTACAAACAAAAGAGGTAGCAGC 

AATCCATTGCTTACAACTGAAGAGGCAAATTTGACAGAGAAAGAGGAAATAAGGCAAGGT 

GAAACACTGATGATAGAAGGAACAGAACAGTTGAAATCTCTCTCTTCAGACTCTTCATTT 

TGCTTTCCCAGGCCTCACTTCTCATTCTCCACTTTGCCAACTGTTTCAAGAACTGTGGAA 

CTCAAATCAGAACCTAATGTCATCAGTTCTCCTGCTGAGTGTTCCTTGGAACTTTCTCCT 

TCAAGGCCTTGTGTTTTACATTCTTCACTCTCTAGGAGAGAGACACCTATTTGTATGTTA 

CCTATTGAAACCGAAAGAAATATTTTTGAAAATTTTGCCCATCCACCAAACATCTCTCCT 

TCTGCCTGTCCCCTTCCCCCTCCTCCTCCTATTTCTCCTCCTTCTCCTCCTCCTGCTCCT 

GCTCCTCTTGCTCCTCCTCCTGACATTTCTCCTTTTTCTCTTTTTTGTCCTCCTCCCTCT 

CCTCCTTCTATCCCTCTTCCTCTTCCTCCTCCTACATTTTTTCCACTTTCCGTTTCAACG 

TCTGGTCCCCCAACACCACCTCTTCTACCTCCATTTCCAACTCCTCTTCCTCCACCACCT 

CCTTCTATTCCTTGCCCTCCACCTCCTTCAGCTTCATTTCTGTCCACAGAGTGTGTCTGT 

ATAACAGGTGTTAAATGCACGACCAACTTGATGCCTGCCGAGAAAATTAAGTCCTCTATG 

ACACAGCTATCAACAACGACAGTGTGTAAAACAGACCCTCAGAGAGAACCAAAAGGCATC 

CTCAGACACGTTAAAAACTTAGCAGAACTTGAAAAATCAGTAGCTAACATGTACAGTCAA 

ATAGAAAAAAACTATCTACGCACAAATGTTTCAGAACTTCAAACTATGTGCCCTTCAGAA 

GTAACAAATATGGAAATCACATCTGAACAAAACAAGGGGAGTTTGAACAATATTGTCGAG 

GGAACTGAAAAACAATCTCACAGTCAATCTACTTCACTGTAA TGTTGCTTTTCTTATTTT 

AGTCGG 



In a search of public sequence databases, the NOV8a nucleic acid sequence, located on 
chromsome 10 has 557 of 955 bases (58%) identical to a GENBANK- 

ID:AF1 69693 |acc:AF 169693.1 mRNA from Homo sapiens (Homo sapiens protocadherin 13 
(PCDH13) mRNA, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV8a polypeptide (SEQ ID NO: 1 8) encoded by SEQ ID NO: 1 7 has 
1973 amino acid residues and is presented in Table 8B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV8 has a signal peptide and is likely 
to be localized extracellularly with a certainty of 0.6760. The most likely cleavage point is 
between residues 26 and 27. 
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Table 8B. Encoded NOV8a protein sequence (SEQ ID NO:18). 



MFRQFYLWTCLASGIILGSLFEICLGQYDDGKDCKLARGGPPATIVAIDEESRNGAGTIL 
VDNMLIKGTAGGPDPTIELSLKDNVDYWVLMDPVKQMLFLNSTGRVLDRDPPMNIHSIVV 
QVQC I NKKVGT 1 1 YHE VR I WRDRNDNS PTFKHE S Y Y ATVNE LTPVGTT I FTGF S GDNG A 
TDIDDGPNGQIEYVIQYNPDDPTSNDTFEIPLMLTGNIVLRKRLNYEDKTRYFVIIQAND 
RAQNLNERRTTTTTLTVDVLDGDDLGPMFLPCVLVPNTRDCRPLTYQAAIPELRTPEELN 
PIIVTPPIQAIDQDRNIQPPSDRPGILYSILVGGTPEDYPRFFHMHPRTAELSLLEPWR 
DFHQKFDLVIKAEQDNGHPLPAFAGLHIEILDENNQSPYFTMPSYQGYILESAPVGATIS 
DSLNLTSPLRIVALDKDIEDTKDPELHLFLNDYTSVFTVTQTGITRYLTLLQPVDREEQQ 
TYTFSITAFDGVQESEPVIVNIQVMDA3STONTPTFPEISYDVYVYTDMRPGDSVIQLTAVD 
ADEGSNGEITYEILVGAQGDFIINKTTGLITIAPGVEMIVGRTYALTVQAADNAPPAERR 
NSICTVYIEVLPPNNQSPPRFPQLMYSLEISEAMRVGAVLLNLQATDREGDSITYAIENG 
DPQRVFNL S ETTG I LTLGKALDRE S TDR Y I L 1 1 TASDGRPDGT S T ATVN I MVTDVNDNAP 
VFDPYLPRNLS VVEEEANAFVGQVKATDPDAGINGQVHYS LGNFNNLFR ITSNGS I YTAV 
KLNREWDYYELVWATDGAVHPRHSTLTLAIKVLDIDDNSPVFTNSTYTVLVEENLPAG 
TTILQIEAKDVDLGANVSYRIRSPEVKHFFALHPFTGELSLLRSLDYEAFPDQEASITFL 
VE AFD I YGTMPPG I ATVTVI VKDMND YP P VFS KR I YKGMVAPDAVKGTP I TTVYAEDADP 
PGLPASRVRYRVDDVQFPYPASIFEVEEDSGRVITRVNLNEEPTTIFKLWVAFDDGEPV 
MS S S ATVKI LVLHPGE I PRFTQEE YRP P PVS ELATKGTMVGVI S AAAINQS I VYS I VSGN 
EEDTFGINNITGVIYVNGPLDYETRTSYVLRVQADSLEWLANLRVPSKSNTAKVYIEIQ 
DENNHPPVFQKKFYI GGVSEDARMFTS VLR VKATDKDTGNYS VMAYRL I I PP IKEGKEGF 
WETYTGLIKTAMLFHNMRRSYFKFQVIATDDYGKGLSGKADVLVSVSWNQLDMQVIVS 
NVP PTLVE KK I EDLTE I LDR YVQE Q I P GAKWVE S I GARRHGDAFS LED YTKCDLTVYA I 
DPQTNRAIDRNELFKFLDGKLLDINKDFQPYYGEGGRILEIRTPEAVTSIKKRGESLGYT 
EGALIjALAFI I ILCCIPAILVVLVSYRQFKVRQAECTRQAECTKTARIQAALPAAKPAVP 
APAPVAAPPPPPPPPPGAHLYEELGDSSMSFLSSLFLLYHFQQSRGNNSVSEDRKHQQW 
MPFSSNTIEAHKSAHVDGSLKSNKLKSARKFTFLSDEDDLSAHNPLYKENISQVSTNSDI 
SQRTDFVDPFSPKIQAKSKSLRGPREKIQRLWSQSVSLPRRLMRKVPNRPEIIDLQQWQG 
TRQKAENENTGI CTNKRGS SNPLLTTEEANLTEKEE IRQGETLMIEGTEQLKSLS SDS S F 
CFPRPHFSFSTLPTVSRTVELKSEPNVISSPAECSLELSPSRPCVLHSSLSRRETPICML 
PIETERNIFENFAHPPNISPSACPLPPPPPISPPSPPPAPAPLAPPPDISPFSLFCPPPS 
PPSIPLPLPPPTFFPLSVSTSGPPTPPLLPPFPTPLPPPPPSIPCPPPPSASFLSTECVC 
ITGVKCTTNLMPAEKIKSSMTQLSTTTVCKTDPQREPKGILRHVKNLAELEKSVANMYSQ 
IEKNYLRTNVSELQTMCPSEVTNMEITSEQNKGSLNNIVEGTEKQSHSQSTSL 



A search of sequence databases reveals that the NOV8a amino acid sequence has 1580 
of 1846 amino acid residues (85%) identical to, and 1682 of 1846 amino acid residues (91%) 
similar to, the 1943 amino acid residue ptnr :TREMBLNE W-ACC : AAG5 3 89 1 protein from 
Mus musculus (Mouse) (PROTOCADHERIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV8a is expressed in at least brain, lymphoid tissue, placenta. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 



58 



«» 4£» «u & 33i s*» ss?- 



NOV8b 

A disclosed NOV8b nucleic acid of 6003 nucleotides (also referred to as CG57452-02) 
encoding a protocadherin-like protein is shown in Table 8C. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
5 stop codons are in bold letters. 



Table 8C. NOV8b nucleotide sequence (SEQ ID NO: 19). 

CAAATGTTTCTTTCTACCTTTTCTTATTTCAGATCAGCTTTGAGTGAACTTTGACAGAAG 
ATGTTTCGACAGTTTTATCTCTGGACATGTTTAGCTTCAGGGATCATCCTGGGCTCTCTC 
TTTGAAATCTGCTTGGGCCAGTATGATGATGGTAAGGATTGCAAACTAGCTAGGGGAGGA 
CCACCAGCTACCATAGTTGCTATTGATGAAGAAAGTCGGAATGGTGCAGGTACAATTCTG 
GTGGACAACATGCTGATCAAAGGGACTGCTGGAGGACCAGACCCCACCATAGAACTTTCT 
TTAAAGGATAATGTGGATTACTGGGTGTTGATGGATCCTGTTAAGCAAATGCTTTTCCTG 
AACAGCACCGGAAGAGTTCTGGATAGAGATCCACCGATGAACATACACTCCATTGTGGTG 
CAGGTCCAGTGCATCAACAAAAAAGTGGGCACTATTATCTACCATGAAGTGCGAATAGTG 
GTGAGAGACAGGAATGACAACTCACCCACTTTCAAGCATGAAAGCTACTATGCCACAGTG 
AATGAGCTCACTCCAGTTGGTACCACAATATTCACAGGATTTTCAGGAGACAATGGAGCT 
ACAGATATAGATGATGGACCAAATGGACAGATAGAGTATGTTATTCAGTATAATCCAGAT 
GATCCGACATCCAATGACACCTTTGAAATTCCCCTAATGTTGACTGGAAATATAGTGTTA 
AGGAAGAGGCTCAACTATGAAGATAAGACTCGCTACTTTGTCATAATCCAAGCTAATGAC 
CGTGCCCAAAATCTGAATGAGAGGCGAACCACCACCACCACTCTCACAGTGGATGTTCTG 
GATGGAGATGACTTGGGTCCAATGTTTCTTCCTTGTGTCCTTGTGCCAAACACTCGTGAT 
TGCCGTCCACTCACTTATCAAGCTGCCATACCTGAGTTGAGAACTCCGGAAGAACTGAAC 
CCCATTATTGTTACGCCACCAATCCAAGCCATTGATCAGGACCGGAATATTCAACCGCCA 
TCAGATAGGCCAGGAATCCTCTATTCCATCCTTGTTGGGACTCCTGAGGATTACCCACGA 
TTTTTCCATATGCATCCTAGGACAGCAGAACTTAGTCTCCTGGAGCCAGTAAACAGAGAC 
TTTCACCAGAAATTTGATTTGGTTATTAAGGCTGAACAAGACAATGGTCATCCTCTTCCT 
GCCTTTGCCAGTCTACACATTGAAATACTGGATGAAAACAATCAAAGTCCATATTTTACA 
ATGCCCAGTTATCAAGGCTATATCCTGGAATCTGCCCCAGTGGGAGCAACCATTTCGGAC 
AGTCTCAATTTGACTTCACCTTTAAGAATAGTAGCTCTGGACAAGGACATAGAAGATACA 
AAAGACCCAGAGCTTCACCTTTTTCTGAATGACTACACCTCAGTCTTCACCGTCACACAG 
ACTGGTATTACTCGCTACCTCAGCTTACTTCAACCAGTGGACAGGGAAGAACAGCAAACT 
TACACCTTTTCGATAACAGCATTTGATGGTGTACAAGAAAGTGAGCCAGTCATCGTCAAT 
ATTCAAGTGATGGATGCAAATGATAACACGCCAACCTTCCCTGAAATATCCTATGATGTG 
TATGTTTATACAGACATGAGACCTGGGGACAGTGTCATACAGCTCACTGCAGTCGACGCA 
GACGAAGGGTCAAATGGGGAGATCACATATGAAATCCTTGTTGGGGCTCAGGGAGACTTC 
ATCATCAATAAAACAACAGGGCTTATCACCATCGCTCCAGGGGTGGAAATGATAGTCGGG 
CGGACTTACGCACTCACGGTCCAAGCAGCGGATAATGCTCCTCCTGCAGAGCGAAGGAAC 
TCCATCTGCACTGTGTATATTGAAGTGCTTCCACCAAATAATCAAAGCCCTCCTCGCTTC 
CCACAGCTGATGTATAGCCTTGAAATTAGTGAAGCCATGAGGGTTGGTGCTGTTTTATTA 
AATCTACAGGCAACTGATCGAGAGGGAGACTCAATAACATATGCCATTGAGAATGGAGAT 
CCTCAGAGAGTTTTTAATCTTTCAGAAACCACGGGGATTCTAACCTTAGGGAAAGCACTG 
GACAGGGAAAGCACTGATCGCTACATTCTGATCATCACAGCTTCAGATGGCAGGCCAGAT 
GGGACCTCAACTGCCACAGTAAACATAGTGGTGACAGATGTCAATGACAATGCTCCAGTG 
TTTGATCCTTATCTGCCAAGAAATTTATCTGTGGTGGAAGAAGAAGCCAATGCCTTTGTG 
GGTCAAGTAAAAGCAACAGACCCTGATGCTGGAATAAATGGTCAAGTGCACTACAGTTTG 
GGTAACTTTAATAATCTTTTTCGTATCACATCCAATGGGAGCATTTACACAGCAGTGAAG 
CTTAACAGAGAAGTCAGGGACTACTATGAACTTGTTGTTGTGGCAACAGATGGAGCAGTA 
CACCCTCGTCATTCAACTCTAACCTTGGCCATCAAGGTTTTGGACATTGATGATAACAGT 
CCTGTGTTCACCAATTCAACATACACTGTCCTTGTTGAAGAGAATTTGCCAGCTGGGACT 
ACCATCCTTCAAATAGAGGCCAAAGATGTCGACCTTGGAGCAAATGTGTCTTACCGGATA 
AGAAGCCCAGAAGTGAAGCACTTTTTTGCACTACATCCATTTACAGGAGAACTATCGCTT 
TTAAGGAGTTTAGATTATGAGGCATTTCCAGACCAAGAAGCAAGTATCACTTTTCTGGTA 
GAGGCCTTTGATATTTATGGAACAATGCCACCTGGTATTGCTACTGTCACAGTGATTGTA 
AAGGATATGAATGATTATCCTCCTGTCTTTAGTAAACGAATATACT^AAGGGATGGTGGCT 
CCGGATGCAGTCAAGGGTACACCTATCACAACAGTTTATGCTGAAGATGCAGACCCTCCT 
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GGATTACCTGCAAGTCGTGTGAGGTATAGAGTAGATGATGTACAGTTTCCTTACCCTGCC 
AGTATTTTTGAAGTGGAAGAAGATTCTGGAAGAGTAATAACACGAGTCAATCTTAATGAA 
GAACCTACAACAATTTTTAAGTTGGTGGTGGTTGCTTTTTGATGATGGGGAGCCTGTGATG 
TCCAGCAGTGCCACAGTGAAGATTCTTGTCTTACATCCTGGTGAGATCCCACGCTTCACA 
CAGGAGGAATATAGGCCTCCTCCAGTAAGTGAACTTGCCACCAAAGGGACCATGGTTGGT 
GTAATTTCTGCTGCTGCCATTAATCAAAGTATTGTGTACTCCATTGTTTCAGGAAATGAA 
GAAGATACATTTGGAATTAATAACATCACAGGTGTTATCTATGTGAATGGACCTCTGGAT 
TATGAGACCAGGACAAGCTATGTACTTCGAGTCCAAGCTGATTCCCTGGAAGTGGTCCTT 
GCCAATCTCCGAGTTCCTTCAAAAAGTAATACAGCTAAAGTATACATTGAGATTCAGGAT 
GAAAATAATCATCCCCCAGTGTTTCAGAAAAAATTCTACATCGGAGGTGTATCTGAAGAT 
GCAAGAATGTTTACTTCTGTACTCAGAGTGAAGGCTACTGATAAAGATACTGGCAATTAT 
AGTGTCATGGCCTACAGACTCATAATACCACCAATTAAAGAGGGAAAAGAAGGATTTGTA 
GTGGAAACATATACAGGGCTTATCAAAACTGCTATGCTCTTCCATAATATGAGGAGATCC 
TACTTCAAGTTTCAAGTTATTGCAACTGACGACTATGGGAAGGGACTGAGCGGCAAAGCC 
GATGTACTCGTAAGTGTCTCCGTGGTCAATCAGCTGGATATGCAAGTCATTGTTTCCAAT 
GTGCCTCCTACTCTAGTGGAAAAAAAGATAGAAGATCTTACAGAAATCTTGGATCGCTAT 
GTTCAGGAACAAATTCCTGGTGCCAAGGTCGTAGTGGAGTCCATTGGAGCTCGCCGGCAT 
GGAGATGCCTTTTCCCTAGAAGATTACACCAAATGTGACTTGACTGTCTATGCAATTGAC 
CCCCAAACCAACAGAGCCATCGATAGAAATGAGCTTTTTAAGTTTTTGGATGGCAAACTA 
CTTGATATCAATAAAGACTTTCAGCCGTATTATGGGGAAGGAGGACGCATTCTGGAGATC 
CGGACTCCAGAGGCAGTGACCAGCATTAAAAAGAGAGGAGAAAGTCTAGGATACACAGAA 
GGGGCCTTGTTGGCTCTGGCCTTCATCATCATCCTCTGCTGCATTCCTGCCATCTTGGTG 
GTTTTGGTCAGCTACAGACAGTTTAAAGTACGTCAAGCTGAGTGTACACGTCAAGCTGAG 
TGTACAAAGACTGCACGAATTCAGGCCGCATTACCCGCGGCTAAACCAGCAGTGCCGGCT 
CCTGCACCAGTGGCAGCGCCCCCGCCGCCGCCGCCGCCTCCGCCAGGTGCGCATCTCTAT 
GAAGAACTTGGAGACAGCTCAATGTCTTTTCTTTCAAGTCTTTTCCTTCTCTACCATTTT 
CAACAAAGCAGGGGAAATAACTCAGTCTCAGAAGACAGGAAACATCAACAAGTTGTGATG 
CCCTTTTCTTCCAATACTATTGAGGCTCACAAGTCAGCTCATGTAGACGGATCACTTAAG 
AGCAACAAACTGAAGTCTGCAAGAAAATTCACATTTCTATCTGATGAGGATGACTTAAGT 
GCCCATAATCCCCTTTATAAGGAAAACATAAGTCAAGTATCAACAAATTCAGACATTTCA 
CAGAGAACAGATTTTGTAGACCCATTTTCACCCAAAATACAAGCCAAGAGTAAGTCTCTG 
AGGGGCCCAAGAGAAAAGATTCAGAGGCTGTGGAGTCAGTCAGTCAGCTTACCCAGGAGG 
CTGATGAGGAAAGTTCCAAATAGACCAGAGATCATAGATCTGCAGCAGTGGCAAGGCACC 
AGGCAGAAAGCTGAAAATGAAAACACTGGAATCTGTACAAACAAAAGAGGTAGCAGCAAT 
CCATTGCTTACAACTGAAGAGGCAAATTTGACAGAGAAAGAGGAAATAAGGCAAGGTGAA 
ACACTGATGATAGAAGGAACAGAACAGTTGAAATCTCTCTCTTCAGACTCTTCATTTTGC 
TTTCCCAGGCCTCACTTCTCATTCTCCACTTTGCCAACTGTTTCAAGAACTGTGGAACTC 
AAATCAGAACCTAATGTCATCAGTTCTCCTGCTGAGTGTTCCTTGGAACTTTCTCCTTCA 
AGGCCTTGTGTTTTACATTCTTCACTCTCTAGGAGAGAGACACCTATTTGTATGTTACCT 
ATTGAAACCGAAAGAAATATTTTTGAAAATTTTGCCCATCCACCAAACATCTCTCCTTCT 
GCCTGTCCCCTTCCCCCTCCTCCTCCTATTTCTCCTCCTTCTCCTCCTCCTGCTCCTGCT 
CCTCTTGCTCCTCCTCCTGACATTTCTCCTTTTTCTCTTTTTTGTCCTCCTCCCTCTCCT 
CCTTCTATCCCTCTTCCTCTTCCTCCTCCTACATTTTTTCCACTTTCCGTTTCAACGTCT 
GGTCCCCCAACACCACCTCTTCTACCTCCATTTCCAACTCCTCTTCCTCCACCACCTCCT 
TCTATTCCTTGCCCTCCACCTCCTTCAGCTTCATTTCTGTCCACAGAGTGTGTCTGTATA 
ACAGGTGTTAAATGCACGACCAACTTGATGCCTGCCGAGAAAATTAAGTCCTCTATGACA 
CAGCTATCAACAACGACAGTGTGTAAAACAGACCCTCAGAGAGAACCAAAAGGCATCCTC 
AGACACGTTAAAAACTTAGCAGAACTTGAAAAATCAGTAGCTAACATGTACAGTCAAATA 
GAAAAAAACTATCTACGCACAAATGTTTCAGAACTTCAAACTATGTGCCCTTCAGAAGTA 
ACAAATATGGAAATCACATCTGAACAAAACAAGGGGAGTTTGAACAATATTGTCGAGGGA 
ACTGAAAAACAATCTCACAGTCAATCTACTTCACTGTAA TGTTGCTTTTCTTATTTTAGT 
CGG 



In a search of public sequence databases, the NOV8b nucleic acid sequence has 3708 
of 4369 bases (84%) identical to a gb:GENBANK-ID:AF281899|acc:AF28 1899.1 mRNA 
from Mus musculus (Mus musculus protocadherin (av) mRNA, complete cds). Public 
nucleotide databases include all GenBank databases and the GeneSeq patent database. 
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The disclosed NOV8b polypeptide (SEQ ID NO:20) encoded by SEQ ID NO: 19 has 
1 972 amino acid residues and is presented in Table 8D using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV8b has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6760. The most likely 
cleavage site is between amino acids 26 and 27. 

Table 8D. Encoded NOV8b protein sequence (SEQ ID NO:20). 

MFRQFYLWTCLASGIILGSLFEICLGQYDDGKDCKLARGGPPATIVAIDEESRNGAGTIL 
VDNML I KGTAGGPDPT I ELS LKDNVD Y WVLMDPVKQMLFLNS TGRVLDRDP PMNIHSIW 
QVQC INKKVGT 1 1 YHEVRI WRDRNDNS PTFKHES YYATVNELTPVGTT I FTGFSGDNGA 
TDIDDGPNGQIEYVIQYTSTPDDPTSlSrDTFEIPLMLTGNIVLRKRLNYEDKTRYFVIIQAND 
RAQNLNERRTTTTTLTVDVLDGDDLGPMFLPCVLVPNTRDCRPLTYQAAIPELRTPEELN 
PIIVTPPIQAIDQDRNIQPPSDRPGILYSILVGTPEDYPRFFHMHPRTAELSLLEPVNRD 
FHQKFDLV I KAEQDNGHPL PAFAS LH I E I LDENNQS P YFTMPS YQGY I LES APVGAT I S D 
S LNLTS PLR I VALDKD I EDTKDPELHL FLND YTS VFTVTQTG I TR YLS LLQPVDREEQQT 
YTFSITAFDGVQESEPVIVNIQVMDANDNTPTFPEISYDVYVYTDMRPGDSVIQLTAVDA 
DEGSNGE ITYE I LVGAQGDF I INKTTGLITI APGVEMIVGRTYALTVQAADNAPPAERRN 
SIC TVY I E VL P PNNQ S P PR F PQLMYS LE I S E AMR VGAVLLNLQ ATDR E GD S I T YA I ENGD 
PQRVFNLSETTGILTLGKALDRESTDRYILIITASDGRPDGTSTATVNIWTDVNDNAPV 
FDPYLPRNLSWEEEANAFVGQVKATDPDAGINGQVHYSLGNFNNLFRITSNGSIYTAVK 
LNRE VRD YYELVWATDGAVHPRHS TLTLA.I KVLD I DDNS P VFTNS T YTVLVEENLPAGT 
TILQIEAKDVDLGANVSYRIRSPEVKHFFALHPFTGELSLLRSLDYEAFPDQEASITFLV 
E AFD I YGTMP PG I ATVTV I VKDMND YP P VF S KR I YKGMVAPD AVKGT P I TTVYAEDAD P P 
GLPASRVRYRVDDVQFPYPAS I FEVEEDSGRVITRVNLNEEPTTI FKLWVAFDDGEPVM 
SSSATVKILVLHPGEIPRFTQEEYRPPPVSELATKGTMVGVISAAAINQSIVYSIVSGNE 
EDTFGINNITGVIYWGPLDYETRTSWLRVQADSLEWLANLRVPSKSNTAKVYIEIQD 
ENNHPPVFQKKFYIGGVSEDARMFTSVLRVKATDKDTGNYSVMAYRLIIPPIKEGKEGFV 
VETYTGLIKTAMLFHNMRRSYFKFQVIATDDYGKGLSGKADVLVSVSWNQLDMQVIVSN 
VPPTLVEKKIEDLTEILDRYVQEQIPGAKVWESIGARRHGDAFSLEDYTKCDLTVYAID 
PQTNRAIDRNELFKFLDGKIiLDINKDFQPYYGEGGRILEIRTPEAVTSIKKRGESLGYTE 
GALLALAFI I ILCCIPAILWLVS YRQFKVRQAECTRQAECTKTARIQAALPAAKPAVPA 
PAPVAAPPPPPPPPPGAHLYEELGDSSMSFLSSLFLIiYHFQQSRGNNSVSEDRKHQQWM 
PFSSNTIEAHKSAHVDGSLKSNKLKSARKFTFLSDEDDLSAHNPLYKENISQVSTNSDIS 
QRTDFVDPFSPKIQAKSKSLRGPREKIQRLWSQSVSLPRRLMRKVPNRPEIIDLQQWQGT 
RQKAENENTG I CTNKRG S SWPLLTTEEANLTEKEE I RQGETLM I EGTEQLKS L S SDS S FC 
FPRPHFSFSTLPTVSRTVELKSEPNVISSPAECSLELSPSRPCVLHSSLSRRETPICMLP 
IETERNIFENFAHPPNISPSACPLPPPPPISPPSPPPAPAPI^PPPDISPFSLFCPPPSP 
PSIPLPLPPPTFFPLSVSTSGPPTPPLLPPFPTPLPPPPPSIPCPPPPSASFLSTECVCI 
TGVKCTTNLMPAEKIKSSMTQLSTTTVCKTDPQREPKGILRHVKNLAELEKSVANMYSQI 
EKNYLRTNVSELQTMCPSEVTNMEITSEQNKGSLNNIVEGTEKQSHSQSTSL 



A search of sequence databases reveals that the NOV8b amino acid sequence has 3708 
of 4369 bases (84%) identical to a gb:GENBANK-ID:AF281899|acc:AF28 1899.1 mRNA 
from Mus musculus (Mus musculus protocadherin (av) mRNA, complete cds). Public amino 
acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV8b is expressed in at least adrenal gland, bone marrow, brain - amygdala, brain - 
cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 
pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 
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spinal cord, spleen, stomach, testis, thyroid, trachea and uterus. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV8a polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 8E. 



Table 8E. BLAST results for NOV8a 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi |l6933555|ref |NP 
149045 .2 | 
(NM_033056) 


protocadherin 15 
precursor; Usher 
syndrome IF 
{autosomal 
recessive, 
severe) [Homo 
sapiens] 


1955 


1954/1973 
(99%) 


1955/1973 
(99%) 


0.0 


gi | 14581464 | gb [ AAK3 
1804. l| (AY029237) 


protocadherin 15 
[Homo sapiens] 


1955 


1955/1973 
(98%) 


1955/1973 
(98%) 


0.0 


gi | 15072441 | gb ) AAK3 
1581. 1| (AY029205) 


protocadherin 15 
[Homo sapiens] 


1955 


1949/1973 
(98%) 


1955/1973 
(98%) 


0.0 


gi(l2963485|ref |NP 
075604. l| 
(NM_023115) 


protocadherin 15; 
Ames waltzer [Mus 
musculus] 


1943 


1626/1977 
(82%) , 
Positives 

1745/1977 
(88%) 


1626/1977 
(82%) , 
Positives 

1745/1977 
(88%) 


0.0 


gi | 18574084 | ref | XP 
053625.2| 
(XM 053625 


protocadherin 15 
precursor [Homo 
sapiens] 


936 


924/928 
(99%) 


925/928 
(99%) 


0.0 



Table 8F lists the domain descriptions from DOMAIN analysis results against NOV8. 
This indicates that the NOV8 sequence has properties similar to those of other proteins known 
to contain this domain. 
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Table 8F. Domain Analysis of NOV8 

gnl 1 Smart [ smart 00112 , CA, Cadherin repeats.; Cadherins are 
glycoproteins involved in Ca2+-mediated cell- cell adhesion. Cadherin 
domains occur as repeats in the extracellular regions which are 
thought to mediate cell -cell contact when bound to calcium. 

CD-Length = 82 residues, 100.0% aligned 

Score = 83.2 bits (204), Expect = le-16 



The disclosed NOV8 polypeptides are members of the protocadherin family, which in 
turn is one of the six subfamilies of the cadherin superfamily. Cadherins are membrane- 
associated glycoproteins that mediate cell-cell interactions in a calcium-dependent fashion. 
5 Protocadherins may act as cell-cell recognition molecules and may be involved in signal 
transduction cascades. 

The disclosed NOV8 polypeptides have homology to the mouse protocadherin whose 
mutant version causes the Ames waltzer mouse phenotype, which includes deafness and a 
balance disorder due to degeneration of the neuroepithelium of the inner ear. Mutant mice 
10 show abnormal stereocilia in the inner ear at a very early age. The gene of invention may 
therefore have a role in developmental processes, cellular communication and disease 
processes such as cancer. 

The disclosed NOV8 nucleic acids of the invention encode a protocadherin-like 
protein includes the nucleic acid whose sequence is provided in Table 8A or 8C or a fragment 

1 5 thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 8A or 8C while still encoding a 
protein that maintains its protocadherin-like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 

20 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 

25 stability of the modified nucleic acid, such that they may be used, for example, as antisense 

binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 42 percent of the bases may be so changed. 
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The disclosed N0V8 protein of the invention includes the protocadherin-like protein 
whose sequence is provided in Table 8B or 8D. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 8B or 8D while still encoding a protein that maintains its protocadherin-like activities 
5 and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 15 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this protocadherin-like 

10 protein (NOV8) may function as a member of a "protocadherin family". Therefore, the NOV8 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 

1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV8 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

20 and disorders as indicated below. For example, a cDNA encoding the protocadherin-like 
protein (NOV8) may be useful in gene therapy, and the protocadherin-like protein (NOV8) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 
example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous 

25 sclerosis, hypercalcemia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, 
Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 
disorders, addiction, anxiety, pain, neurodegenerationhemophilia, hypercoagulation, idiopathic 
thrombocytopenic purpura, autoimmune disease, allergies, immunodeficiencies, 
transplantation, graft versus host disease (GVHD), lymphaedema, hearing loss, tinnitus, 

30 balance disorders, cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, 
aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus 
arteriosus, pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), valve 
diseases, tuberous sclerosis, scleroderma, obesity, transplantation, cancer, tissue degeneration, 
bacterial/viral/parasitic infection, or other pathologies or conditions. The NOV8 nucleic acid 
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encoding the protocadherin-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV8 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV8 substances for use in therapeutic or 
diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV8 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV9 

A disclosed NOV9 nucleic acid of 13700 nucleotides (also referred to as CG57625-01) 
encoding a protocadherin-like protein is shown in Table 9A. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
stop codons are in bold letters. 



Table 9 A. NOV9 nucleotide sequence (SEQ ID NO:21). 

TCTCTCTTTCTTCCCTCCAGGATGGAAGTATGATGTGA TGGATATAATTATGGGACACTG 
TGTGGGCACACGGCCTCCTGCTTGTTGCCTCATCCTCCTGCTTTTCAAGCTTTTGGCCAC 
TGTCTCCCAGGGGCTGCCAGGGACTGGACCCCTGGGCTTCCACTTCACACATTCCATTTA 
TAATGCTACCGTGTATGAGAACTCAGCAGCAAGGACCTACGTCAACAGCCAGAGTAGAAT 
GGGCATCAC CTTAATAGATCT ATC C TGGGATATCAAATAC AGAATAGTGTCCGGAGACGA 
GGAAGGCTTTTTCAAAGCAGAGGAAGTCATCATTGCAGATTTCTGTTTTCTCAGAATAAG 
AACTAAAGGTGGCAATTCTGCCATATTAAATAGGGAAATCCAGGATAATTATTTATTGAT 
AGTAAAAGGTTCTGTCAGAGGAGAGGATTTGGAAGCATGGACCAAAGTGAATATACAGGT 
TTTAGATATGAATGATCTGAGACCTTTGTTTTCACCCACAACATACTCTGTTACCATAGC 
AGAAAGCACACCTCTAAGGACTAGTGTTGCCCAGGTGACTGCAACAGACGCAGATATTGG 
TTCCAATGGAGAATTCTACTACTACTTTAAAAATAAAGTTGATCTCTTTTCAGTTCACCC 
CACGAGTGGTGTCATCTCCTTAAGTGGTCGATTAAATTATGATGAAAAGAATAGGTATGA 
TCTGGAAATTTTGGCTGTGGACCGGGGAATGAAACTGTATGGGAACAATGGAGTGAGCAG 
TACTGCAAAGCTTTATGTTCACATTGAGCGCATAAATGAACATGCCCCAACAATCCATGT 
AGTCACTCATGTTCCTTTCTCGTTGGAAAAAGAGCCAACATATGCAGTGGTGACAGTTGA 
TGACTTAGATGATGGAGCGAATGGAGAGATCGAATCTGTTTCCATTGTGGCTGGGGATCC 
TTTAGATCAGTTCTTCCTGGCTAAGGAAGGAAAGTGGTTGAATGAGTACAAGATTAAGGA 
GAGGAAGCAGATTGACTGGGAGAGCTTTCCCTATGGCTACAATCTCACTCTTCAAGCAAA 
AGACAAGGGATCTCCTCAAAAATGTTCAGCATTAAAGGCAGTCTACATTGGCAACCCCAC 
AAGAGACACTGTCCCCATTAGATTTGAAAAAGAAGTGTACGATGTGAGCATAAGTGAATT 
TTCCCCTCCTGGTGTCGTGGTTGCTATAGTAAAATTAAGTCCTGAACCGATAGATGTGGA 
ATACAAATTATCTCCTGGTGAGGATGCAGTGTACTTTAAAATTAATCCTCGGTCGGGTCT 
GATTGTTACAGCACGGCCACTGAATACTGTTAAGAAGGAGGTTTATAAACTGGAGGTGAC 
AAACAAGGAAGGAGATTTAAAAGCACAGGTCACCATCAGCATAGAAGATGCAAATGACCA 
CACCCCAGAATTTCAGCAACCACTGTATGATGCTTATGTGAATGAAAGTGTCCCAGTGGG 
AACCAGCGTTCTAACAGTTTCAGCTTCTGATAAGGATAAAGGAGAAAATGGGTACATCAC 
CTATAGTATCGCTAGCCTGAATTTGTTACCATTTGTCATTAATCAGTTTACAGGTGTTAT 
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TAGCACAACTGAAGAACTGGATTTTGAATCCTCCCCAGAAATTTACAGATTCATTGTTAG 
AGCCTCTGACTGGGGTTCACCATACCGCCATGAAAGTGAGGTCAATGTGACTATTCGAAT 
AGGAAATGTCAACGACAACAGCCCTCTCTTTGAAAAAGTGGCTTGCCAGGGAGTTATTTC 
ATATGACTTTCCAGTTGGTGGTCACATCACAGCAGTCTCAGCGATCGATATCGATGAACT 
TGAACTTGTAAAGTACAAAATCATTTCTGGAAATGAACTTGGCTTCTTTTATTTAAACCC 
AGATTCTGGTGTTTTACAGCTTAAAAAATCACTGACAAATTCTGGCATTAAAAATGGCAA 
TTTTGCCCTCAGAATTACAGCAACTGATGGAGAGAATCTTGCAGACCCCATGTCTATTAA 
CATTTCAGTCCTACATGGGAAAGTGTCTTCAAAGAGCTTCAGTTGCAGAGAAACTCGTGT 
GGCTCAAAAGCTGGCAGAGAAACTACTCATTAAGGCAAAAGCAAATGGGAAACTGAATCT 
GGAAGATGGATTTCTTGACTTTTATTCAATTAATAGACAGGGACCATATTTTGACAAGTC 
TTTTCCTTCTGATGTGGCTGTAAAGGAGGATCTGCCAGTTGGTGCTAACATTCTGAAGAT 
TAAAGCCTATGATGCCGACTCTGGCTTCAATGGAAAAGTGCTATTTACAATATCAGATGG 
AAATACGGATAGTTGCTTTAATATTGATATGGAGACTGGGCAGCTTAAAGTCCTTATGCC 
CATGGATCGAGAACACACAGACCTCTATCTCCTTAATATCACCATCTATGACTTAGGTAA 
TCCACAGAAATCGTCATGGAGACTGCTGACCATCAATGTGGAGGATGCTAATGACAATAG 
CCCAGTTTTTATTCAAGACAGTTACTCAGTTAACATTCTTGAAAGTTCAGGCATTGGTAC 
TGAAATCATTC AAGTGGAAGC CAGAGAC AAAGACTTAGGTT CTAATGGTGAAGTGAC TTA 
CTCAGTCTTGACAGATACACAGCAGTTTGCCATCAATAGCTCAACTGGAATCGTTTATGT 
AGCCGACCAGTTGGACCGGGAATCCAAAGCCAATTATTCTTTGAAAATAGAAGCCAGGGA 
CAAGGCAGAGAGTGGTCAGCAGCTGTTTTCAGTTGTCACTCTTAAAGTTTTTTTAGATGA 
TGTCAATGACTGCTCCCCAGCTTTCATTCCCAGTAGCTATAGTGTGAAGGTTCTTGAAGA 
TCTCCCTGTTGGCACTGTCATTGCTTGGCTTGAGACCCATGATCCAGATCTTGGACTGGG 
GGGTCAAGTGCGCTATTCTTTGGTCAATGACTATAATGGGAGATTTGAAATAGATAAAGC 
AAGTGGTGCCATCCGCTTGAGCAAAGAGCTTGATTATGAGAAACAGCAGTTCTATAACCT 
TACTGTGCGGGCCAAAGACAAAGGGCGGCCTGTCTCTCTGTCATCTGTTTCCTTTGTTGA 
GGTGGAAGTGGTGGATGTCAATGAAAACCTCCACACTCCCTATTTCCCAGACTTTGCTGT 
TGTTGGATCTGTAAAGGAAAACTCACGCATTGGAACAAGCGTGCTGCAGGTGACTGCTCG 
AGATGAAGACTCCGGAAGGGATGGAGAGATCCAGTACTCCATCAGGGATGGCAGTGGTCT 
TGGAAGGTTCAGTATAGACGACGAGAGTGGGGTCATCACTGCCGCAGACATTCTTGATCG 
GGAGACAATGGGGTCATACTGGCTAACAGTGTATGCCACAGACAGGGGCGTTGTTCCACT 
CTACTCCACCATTGAGGTCTACATTGAAGTTGAAGATGTGAATGACAATGCCCCGCTGAC 
CTCAGAACCCCTATATATTATCCTGGTCATGGATAAACATCCGAAGGACGTATCTGTCAT 
TCAGATCCAGGCTGAAGATCCTGACTCCAGTTCCAATGAAAAACTGACATACAGGATTAC 
AAGTGGAAATCCTCAGAATTTTTTGTGCATCAATATCAAAACAGGGCTGATTACAACAAC 
TTCAAGGAAATTGGATCGAGAACAGCAGGCAGAACATTTTCTGGAGGTGACTGTGACAGA 
TGGTGGTCCCTCTCCAAAACAGTCAACCATTTGGGTGGTGGTTCAGGTTCTAGATGAAAA 
TGACAACAAGCCCCAGTTCCCAGAGAAGGTCTACCAGATCAAGCTGCCAGAACGTGACCG 
AAAGAAGAGAGGAGAACCGATTTACAGGGCTTTTGCATTTGATAGAGATGAGGGCCCCAA 
CGCAGAAATCTCCTACAGTATTGTGGATGGGAATGATGACGGAAAGTTCTTTATTGACCC 
TAAAACTGGGATGGTTTCTTCTAGAAAGCAGTTTACAGCAGGCAGTTATGACATCCTAAC 
GATAAAGGCAGTGGACAATGGGCGCCCACAGAAATCCTCCACGGCCCGCCTCCACATTGA 
ATGGATTAAGAAACCACCCCCTTCACCTATACCATTGACCTTCGATGAGCCGTTTTATAA 
CTTCACAGTCATGGAAAGTGATAGAGTGACTGAAATTGTAGGGGTGGTGTCTGTGCAGCC 
AGCTAACACCCCTCTGTGGTTTGACATAGTTGGGGGGAATTTTGACAGCGCTTTTGATGC 
AGAGAAGGGTGTTGGGACAATTGTCATCGCAAAACCTTTGGATGCAGAGCAGAGGTCCAT 
CTATAATATGAGTGTGGAAGTCACCGATGGGACAAATGTTGCTGTTACTCAGGTATTTAT 
CAAAGTGCTGGATAATAATGATAATGGCCCAGAATTCTCTCAGCCGAATTACGATGTGAC 
AATTTCCGAGGATGTGCTTCCAGACACGGAGATCCTGCAGATTGAAGCCACAGATAGAGA 
TGAGAAGCACAAGCTGAGCTACACTGTTCATAGCAGCATCGACTCCATCAGCATGAGAAA 
ATTCCGGATTGACCCTAGCACTGGCGTGCTCTATACTGCCGAGAGGCTGGACCATGAGGC 
CCAGGACAAGCACATTCTCAACATAATGGTCAGAGATCAGGAGTTTCCTTATCGAAGAAA 
CTTGGCCCGAGTCATTGTGAATGTGGAGGATGCTAATGATCACAGTCCTTATTTTACCAA 
CCCACTGTATGAAGCGTCTGTGTTTGAATCTGCTGCTCTGGGATCAGCTGTTCTGCAAGT 
GACGGCTCTGGACAAAGACAAAGGAGAAAATGCAGAACTCATATATACCATAGAAGCAGG 
GAACACTGGGAACATGTTTAAGATCGAACCGGTCCTAGGCATCATCACCATTTGCAAAGA 
ACCAGACATGACGACGATGGGTCAGTTTGTCCTATCCATCAAAGTCACAGATCAGGGATC 
CCCGCCAATGTCTGCTACTGCAATTGTGCGCATTTCCGTCACCATGTCTGACAATTCTCA 
CCCCAAGTTCATTCACAAAGACTACCAAGCAGAAGTAAATGAAAATGTTGACATTGGAAC 
ATCAGTCATTCTAATCTCTGCCATCAGTCAATCTACCCTCATTTATGAAGTCAAAGATGG 
AGACATTAATGGGATCTTTACCATAAATCCATATTCTGGAGTCATCACCACTCAGAAGGC 
CCTGGATTATGAGCGCACATCCTCTTATCAACTCATCATTCAGGCCACCAATATGGCAGG 
AATGGCTTCCAATGCTACAGTCAATATTCAGATTGTTGATGAAAATGATAATGCCCCAGT 
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TTTTCTCTTTTCTCAATACTCAGGCAGCCTAAGTGAGGCTGCCCCAATTAATAGCATTGT 
CAGGAGCTTGGATAACAGCCCACTGGTGATTCGAGCCACAGATGCTGACAGCAACCGGAA 
TGCTCTGCTTGTGTATCAGATTGTGGAGTCAACAGCAAAAAAGTTTTTCACGGTGGACTC 
CAGTACAGGTGCAATCAGAACAATTGCCAACCTGGACCATGAAACCATTGCCCATTTCCA 
TTTTCATGTGCATGTGAGAGACAGTGGTAGCCCCCAACTGACTGCAGAGAGTCCCGTTGA 
AGTCAACATTGAGGTGACAGATGTGAATGATAACCCACCTGTTTTTACTCAGGCTGTGTT 
TGAGACTATCTTACTTCTACCTACCTATGTTGGAGTGGAGGTTCTGAAAGTTAGTGCCAC 
AGATCCTGACTCTGAGGTACCCCCTGAACTGACATACAGCCTAATGGAAGGCAGTTTGGA 
TCATTTTTTAATTGACTCAAACAGTGGAGTACTTACCATAAAAAACAACAACCTCTCCAA 
GGATCACTACATGCTGATAGTTAAGGTGTCTGATGGAAAGTTCTACAGTACCTCCATGGT 
CACCATCATGGTTAAAGAAGCCATGGACAGCGGCCTCCACTTTACACAAAGCTTCTATTC 
CACCTCAATCTCAGAGAACAACACTAACATAACCAAAGTTGCTATTGTCAATGCAGTTGG 
AAATCGCCTTAATGAGCCCTTAAAATACAGCATCTTAAACCCAGGAAATAAGTTCAAGAT 
AAAATCTACCTCAGGGGTCATTCAGACGACTGGAGTCCCCTTTGACCGTGAAGAACAAGA 
GTTATATGAGCTGGTGGTAGAAGCCAGCCGTGAGCTGGACCATCTGCGTGTGGCCAGAGT 
GGTGGTCAGGGTTAACATTGAAGACATAAATGACAATTCTCCAGTCTTTGTGGGCCTCCC 
ATACTATGCTGCTGTTCAAGTGGATGCGGAACCCGGGACTCTGATTTATCAGGTGACAGC 
CATTGACAAAGATAAAGGTCCAAATGGAGAAGTGACCTATGTCCTGCAGGATGACTATGG 
CCACTTTGAAATTAACCCTAATTCAGGGAATGTTATTTTAAAGGAAGCATTCAACTCTGA 
CTTGTCCAACATTGAGTATGGAGTCACCATCCTAGCCAAGGATGGCGGAAAACCTTCTTT 
GTCTACATCTGTGGAGCTTCCCATCACTATTGTCAACAAAGCAATGCCTGTGTTTGATAA 
GCCCTTTTATACAGCATCTGTCAATGAAGACATCAGAATGAACACACCCATCCTAAGCAT 
CAATGCCACCAGTCCAGAAGGCCAAGGCATCATATATATCATTATCGATGGGGACCCTTT 
TAAACAGTTTAACATTGACTTTGACACTGGGGTCCTGAAAGTTGTTAGCCCTTTGGATTA 
TGAAGTTACATCTGCTTACAAGCTGACAATAAGAGCCAGCGACGCCCTTACTGGTGCTAG 
GGCTGAAGTCACTGTTGACTTGCTAGTTAATGATGTAAATGACAACCCCCCTATTTTCGA 
TCAGCCTACATACAATACAACACTATCAGAAGCATCTCTTATTGGGACACCTGTTTTACA 
AGTTGTCTCTATTGATGCAGACTCAGAAAACAATAAAATGGTACATTATCAGATTGTCCA 
GGATACCTACAATAGCACAGATTATTTTCACATAGATAGCTCAAGTGGCTTAATCCTGAC 
AGCACGAATGCTGGACCATGAGTTAGTACAACACTGCACTTTGAAAGTCAGATCAATAGA 
TAGTGGCTTCCCATCACTGAGCAGTGAGGTTCTCGTTCATATCTACATCTCTGATGTAAA 
TGACAACCCTCCAGTTTTTAATCAGCTCATTTATGAGTCATATGTGAGTGAATTAGCCCC 
CCGGGGCCATTTTGTAACCTGTGTACAAGCCTCTGATGCAGACAGCTCTGATTTTGACCG 
GTTGGAATATAGCATTTTATCTGGGAATGACCGGACGAGCTTTCTGATGGACAGCAAGAG 
TGGAGTTATCACATTGTCCAACCATCGGAAGCAGCGGATGGAGCCTCTGTACAGTCTCAA 
TGTGTCTGTCTCTGATGGGTTGTTCACCAGCACTGCACAGGTGCATATTAGGGTACTTGG 
GGCTAACTTGTACAGCCCTGCCTTTTCACAAAGCACATACGTAGCTGAGGTGAGAGAGAA 
CGTGGCTGCAGGAACAAAGGTAATTCATGTTCGAGCCACAGATGGTGATCCAGGGACTTA 
TGGGCAGATCAGCTATGCCATCATCAATGACTTTGCCAAGGATCGATTCCTCATAGACAG 
CAATGGGCAGGTCATCACCACAGAAAGGCTAGACCGGGAAAACCCTCTAGAAGGGGATGT 
TAGTATTTTTGTGAGGGCCCTTGATGGTGGAGGGAGAACAACTTTCTGCACTGTGAGAGT 
GATTGTTGTGGATGAAAATGACAATGCTCCCCAGTTCATGACAGTGGAATATAGAGCCAG 
TGTCAGGGCAGATGTTGGAAGGGGCCACTTGGTCACTCAAGTTCAAGCCATAGATCCCGA 
TGATGGAGCAAATTCAAGGATTACTTATTCCCTCTATAGCGAGGCCTCTGTTTCAGTGGC 
CGACCTCCTGGAAATCGATCCTGACAATGGCTGGATGGTCACAAAGGGTAATTTTAACCA 
G CTG AAAAATAC AGTGCTTT CGTT CTTTGTC AAAGC AGT AG ATGGGGGC ATC CC AGT AAA 
GCACTCCCTCATTCCTGTCTATATCCACGTCTTGCCCCCTGAAACGTTCTTGCCATCATT 
CACCCAGTCTCAGTATTCCTTTACCATTGCAGAAGATACAGCCATTGGGAGTACAGTGGA 
CACCCTGAGGATTTTGCCCAGTCAGAATGTCTGGTTCAGCACAGTTAATGGGGAACGGCC 
AGAAAATAACAAAGGGGGCATATTCGTCATAGAACAGGAAACAGGCACTATTAAGCTTGA 
CAAACGCCTTGACCGTGAAACCAGCCCAGCTTTCCACTTTAAAGTAGCAGCCACTATACC 
CCTGGACAAAGTAGACATTGTGTTTACTGTGGATGTAGATATCAAGGTATTGGATTTGAA 
TGACAACAAGCCAGTCTTTGAAACTTCAAGCTATGACACCATTATAATGGAAGGGATGCC 
TGTTGGCACCAAACTCACACAAGTGAGAGCTATTGATATGGACTGGGGAGCCAATGGACA 
AGTCACTTACTCCCTCCACTCGGATTCCCAGCCCGAAAAGGTAATGGAAGCATTCAATAT 
TGACAGCAACACGGGCTGGATCAGTACCTTGAAGGACCTAGATCACGAGACAGACCCCAC 
ATTCACCTTCTCTGTGGTGGCCTCTGACCTTGGAGAGGCATTCTCTCTTTCCTCCACGGC 
CTTGGTCTCTGTCAGAGTGACAGATATAAATGACAATGCACCAGTCTTCGCGCAGGAAGT 
GTACCGAGGGAATGTGAAGGAGAGCGACCCACCGGGCGAGGTGGTAGCCGTCCTCAGCAC 
CTGGGACAGAGACACATCCGACGTTAATCGCCAAGTGAGCTACCATATTACAGGTGGAAA 
CCCTCGAGGAAGGTTTGCTCTGGGCCTGGTGCAAAGTGAGTGGAAGGTCTATGTGAAGAG 
GCCTCTAGACAGAGAAGAACAGGACATTTACTTTCTCAATATCACTGCCACTGATGGGCT 
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TTTTGTCACACAGGCCATGGTGGAAGTGAGCGTCAGTGATGTGAATGACAATAGCCCAGT 
GTGTGATCAGGTTGCATATACAGCATTACTTCCTGAAGACATTCCATCAAATAAAATCAT 
CCTGAAAGTCAGTGCAAAGGATGCTGATATTGGATCCAATGGATATATACGATACTCACT 
CTATGGATCTGGAAACAGTGAATTTTTTCTAGATCCAGAAAGTGGCGAGTTAAAAACCTT 
GGCTCTGTTGGACCGGGAGAGGATCCCCGTGTACAGCCTGATGGCCAAGGCCACTGACGG 
GGGTGGCAGGTTCTGCCAGTCCAACATCCACCTAATCCTGGAGGATGTGAATGATAACCC 
CCCTGTGTTTTCTTCTGACCACTACAACACCTGTGTCTATGAGAACACAGCCACCAAGGC 
TCTGTTGACCAGAGTTCAAGCCGTGGACCCCGACATTGGCATCAATAGGAAGGTCGTGTA 
CTCCCTGGCAGACTCAGCTGGTGGGGTCTTCTCCATTGACAGCTCATCTGGCATCATCAT 
CCTGGAGCAGCCACTGGACCGTGAGCAGCAGTCTTCGTACAACATCAGCGTGCGGGCCAC 
TGACCAGAGTCCTGGACAGTCCCTGTCCTCTCTCACTACTGTCACCATCACCGTTCTGGA 
CATTAATGACAACCCCCCTGTGTTTGAGAGGAGGGACTACCTGGTGACGGTGCCTGAGGA 
CACCTCCCCTGGCACCCAAGTCCTTGCTGTTTTTGCCACCAGCAAAGATATTGGCACAAA 
TGCTGAGATCACTTATCTCATCCGGTCTGGGAACGAACAAGGGAAATTTAAGATCAACCC 
CAAGACAGGGGGTATTTCTGTCTCTGAAGTCCTGGACTATGAATTATGCAAAAGGTTTTA 
CCTGGTAGTGGAAGCCAAAGATGGGGGCACCCCAGCTCTCAGCGCTGTGGCCACTGTCAA 
CATCAACCTCACAGATGTTAATGACAACCCTCCCAAGTTCAGCCAAGACGTCTACAGTGC 
GGTTATCAGTGAAGACGCCTTGGTGGGAGACTCTGTCATTTTGCTAATAGCAGAAGATGT 
AGACAGCCAGCCCAACGGACAGATTCATTTTTCCATTGTGAATGGAGATCGGGACAATGA 
ATTTACTGTAGATCCTGTCTTGGGACTTGTGAAAGTTAAGAAGAAATTGGACCGGGAACG 
GGTGTCTGGATACTCTCTGCTTGTCCAGGCCGTAGACAGTGGCATTCCTGCAATGTCATC 
AACTGCAACTGTCAACATTGATATTTCTGATGTGAATGACAACAGCCCGGTGTTTACACC 
TGCCAACTATACTGCTGTGATTCAGGAAAATAAGCCAGTGGGCACCAGCATCTTGCAGCT 
GGTGGTGACAGACAGAGACTCCTTTCACAATGGGCCTCCCTTTTCATTCTCTATTTTGTC 
GGGAAATGAAGAGGAGGAGTTTGTGTTGGACCCTCATGGGATCTTGCGGTCGGCTGTGGT 
CTTCCAGCACACAGAGTCTCTGGAATACGTGTTGTGTGTCCAGGCAAAGGATTCAGGCAA 
ACCCCAGCAAGTTTCTCACACTTACATCCGCGTGCGAGTCATTGAGGAAAGCACCCACAA 
GCCCACAGCCATTCCCCTGGAAATTTTCATTGTCACCATGGAGGATGACTTTCCTGGTGG 
GGTCATTGGGAAGATTCATGCCACAGATCAAGACATGTATGATGTGCTCACATTTGCCCT 
GAAATCGGAGCAGAAAAGCTTATTTAAAGTGAACAGTCACGATGGGAAAATCATCGCCCT 
GGGAGGCCTGGACAGCGGCAAGTATGTCCTGAATGTGTCTGTGAGTGATGGTCGCTTCCA 
GGTACCCATTGATGTGGTCGTGCATGTGGAGCAGTTGGTGCATGAGATGCTGCAGAACAC 
TGTCACCATCCGCTTTGAAAATGTGTCCCCTGAGGACTTCGTGGGGCTGCACATGCATGG 
GTTCCGGCGCACCCTGCGGAATGCAGTCCTCACCCAGAAGCAGGACAGCCTGCGCATCAT 
CAGCATCCAGCCCGTGGCAGGCACCAACCAACTGGACATGCTGTTTGCGGTGGAGATGCA 
CAGCAGCGAGTTCTACAAGCCAGCCTACCTGATCCAGAAGCTGTCCAATGCTAGAAGACA 
CCTGGAGAATATCATGCGCATCTCAGCCATCTTGGAGAAGAACTGCTCAGGGCTGGACTG 
TCAGGAACAGCATTGTGAGCAAGGCTTGTCACTCGATTCCCACGCGCTCATGACCTACAG 
CACGGCTCGCATCAGCTTTGTGTGTCCGCGTTTCTACAGGAACGTGCGTTGCACCTGCAA 
TGGTGGACTGTGTCCGGGGTCCAACGATCCTTGTGTGGAGAAGCCGTGTCCAGGGGACAT 
GCAGTGTGTCAGTTATGAAGCCAGCAGGAGACCGTTCCTCTGCCAGTGTCCACCAGGGAA 
GCTCGGAGAGTGCTCAGGGCACACTTCTCTCAGCTTTGCTGGAAACAGTTACATCAAATA 
TCGGCTTTCTGAAAATAGCAAAGAAGAGGATTTCAAACTAGCTCTGCGTCTTCGAACACT 
G CAAAGCAATGGGATTATAATGTACACCAGAGCAAATC CCTGCATAATTCTGAAGCAGAT 
TGTGGATGGCAAGCTGTGGTTCCAGCTGGACTGCGGCAGCGGCCCTGGAATCTTGGGCAT 
CTCGGGCCGTGCTGTCAACGACGGGAGCTGGCACTCGGTCTTCCTGGAGCTCAACCGCAA 
TTTCACGAGCCTGTCCCTGGATGACAGCTACGTGGAGCGGCGCCGGGCGCCCCTCTACTT 
CCAGACGCTGAGCACTGAGAGTAGCATCTACTTCGGCGCCCTGGTGCAAGCGGATAACAT 
CCGCAGCCTGACTGACACGCGGGTCACGCAGGTGCTCAGCGGCTTCCAGGGCTGCCTGGA 
CTCGGTGATACTGAATAACAATGAGCTGCCGCTGCAGAACAAGCGCAGCAGCTTCGCGGA 
GGTGGTGGGCCTGACGGAGCTGAAGCTGGGCTGCGTGCTCTATCCCGACGCCTGCAAGCG 
CAGCCCGTGCCAGCACGGGGGCAGCTGCACTGGCCTGCCATCGGGGGGTGGCTATCAGTG 
TACCTGTCTCTCACAGTTTACGGGGAGAAACTGTGAATCTGAGATTACAGCCTGCTTCCC 
AAACCCCTGCCGGAATGGAGGATCCTGCGATCCAATAGGAAACACTTTCATCTGCAATTG 
TAAAGCTGGGCTCACTGGAGTCACGTGTGAGGAGGACATCAATGAGTGCGAACGAGAGGA 
GTGTGAGAACGGAGGCTCCTGCGTGAACGTGTTCGGCTCCTTCCTCTGCAACTGCACGCC 
GGGCTACGTGGGCCAGTACTGCGGGCTGCGCCCCGTGGTGGTACCCAATATCCAGGCTGG 
CCACTCCTACGTGGGGAAGGAGGAGCTCATCGGCATCGCCGTGGTCCTCTTCGTCATCTT 
CATCCTGGTGGTTCTCTTCATAGTCTTCCGCAAGAAGGTCTTCCGCAAGAACTACTCCCG 
CAACAACATCACGCTAGTGCAGGACCCGGCCACCGCCGCCCTGCTTAACAAGAGCAATGG 
CATCCCGTTCCGGAACCTGCGCGGCAGTGGGGACGGCCGCAACGTCTACCAGGAGGTGGG 
GCCCCCGCAGGTCCCCGTGCGCCCCATGGCCTACACACCCTGCTTCCAGAGTGACTCCAG 
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GAGCAACCTGGATAAGATCGTGGACGGGCTGGGAGGCGAGCACCAGGAAATGACCACGTT 
TCACCCTGAGTCGCCCCGCATCCTGACAGCCCGGCGGGGCGTGGTCGTGTGCAGTGTGGC 
CCCCAACCTCCCCGCCGTGTCACCCTGCCGCTCCGACTGCGACTCCATCCGGAAGAATGG 
CTGGGACGCGGGAACTGAGAGTAATAAAGGCAGCAACTCTGAAGTTCAGTCCCTCAGCTC 
CTTCCAGTCAGATTCTGGTGACGACAATGCCTATCACTGGGACACCTCTGATTGGATGCC 
AGGGGCCCGCCTGTCGGACATAGAGGAAGTGCCCAACTATGAGAACCAGGATGGAGGGTC 
TGCACACCAGGGGAGCACACGGGAGCTGGAGAGCGATTACTACCTGGGTGGTTATGACAT 
TGACAGTGAATACCCACCCCCTCATGAAGAGGAGTTCTTGAGTCAGGACCAGCTGCCTCC 
TCCTCTCCCGGAGGACTTCCCAGACCAATATGAGGCCCTGCCACCCTCCCAGCCTGTCTC 
CCTGGCCAGCACACTGAGCCCAGACTGCAGGAGAAGGCCCGAGTTTCATCCTAGCCAGTA 
TCTCCCTCCTCACCCATTCCCCAACGAAACGGATTTGGTGGGCCCGCCTGCCAGCTGTGA 
ATTTAGTACTTTTGCTGTGAGCATGAACCAGGGCACAGAGCCCACAGGCCCAGCAGACAG 
CGTGTCTCTGTCCTTGCACAATTCCAGAGGCACCTCATCCTCGGATGTGTCTGCCAACTG 
CGGCTTTGACGATTCCGAAGTAGCCATGAGTGACTACGAGAGCGTGGGAGAGCTCAGCCT 
CGCCAGCCTTCACATTCCCTTTGTGGAGACTCAGCATCAGACTCAAGTGTAGACATCACA 
TCTTGGGTACTTCACCCTGT 



In a search of public sequence databases, the NOV9 nucleic acid sequence, located on 
chromsome 1 1 has 4976 of 7882 bases (63%) identical to a gb:GENBANK- 
ID:AF100960|acc:AFl 00960.1 mRNA from Rattus norvegicus (Rattus norvegicus 
5 protocadherin (Fat) mRNA, complete cds). Public nucleotide databases include all GenBank 
databases and the GeneSeq patent database. 

The disclosed NOV9 polypeptide (SEQ ID NO:22) encoded by SEQ ID NO:21 has 
4544 amino acid residues and is presented in Table 9B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV9 has 2 signal peptide and is likely 
10 to be localized extracellularly with a certainty of 0.4600. The most likely cleavage site is 
between residues 32 and 33. 



Table 9B. Encoded NOV9 protein sequence (SEQ ID NO;22). 

MDIIMGHCVGTRPPACCLILLLFKLLATVSQGLPGTGPLGFHFTHSIYNATVYENSAART 
YVNSQSRMGITLIDLSWDIKYRIVSGDEEGFFKAEEVIIADFCFLRIRTKGGNSAILNRE 
IQDNYLLIVKGSTOGEDLEAWTKWIQVLDMNDLRPLFSPTTYSVTIAESTPLRTSVAQV 
TATDADIGSNGEFYYYFKNKVDLFSVHPTSGVISLSGRLOTDEKNRYDLEILAVDRGMKi 
YGNNGVS S TAKLYVH I ER INEHAPT I HWTHVP FSLEKE PTYAWTVDDLDDGANGE I E S 
VSIVAGDPLDQFFLAKEGKWLNEYKIKERKQIDWESFPYGYNLTLQAKDKGSPQKCSALK 
AVYIGNPTRDTVPIRFEKEVYDVSISEFSPPGWVAIVKLSPEPIDVEYKLSPGEDAVYF 
KINPRSGLIVTARPLNTVKKEVYKLEVTNKEGDLKAQVTISIEDANDHTPEFQQPLYDAY 
VNE S VPVGTS VLTVS ASDKDKGENGY I TYS I AS LNLLPFVINQFTGV I STTEELDFE S S P 
E I YRF I VRASDWGS P YRHES EVNVT I R I GNVNDNS PLFEKVACQGVI S YDFP VGGH I TAV 
S A I D IDELELVKYK IIS GNELGFFYLNPDS GVLQLKKSLTNS G I KNGNFALR I TATDGEN 
LADPMSINISVLHGKVSSKSFSCRETRVAQKLiAEKLLIKAKANGKLNLEDGFLDFYSINR 
QGPYFDKSFPSDVAVKEDLPVGANILKIKAYDADSGFNGKVLFTISDGNTDSCFNIDMET 
GQLKVLMPMDREHTDLYLLNITIYDLGNPQKSSWRLLTINVEDANDNSPVFIQDSYSVNI 
LESSGIGTEIIQVEARDKDLGSNGEVTYSVLTDTQQFAINSSTGIVYVADQLDRESKANY 
SLKI EARDKAESGQQLFSVVTLKVFLDDVNDCS PAF I PS S YSVKVLEDLPVGTVT AWLET 
HDPDLGLGGQVRYSLVNDYNGRFEIDKASGAIRLSKELDYEKQQFYNLTVRAKDKGRPVS 
LSSVSFVEVEWDVNENLHTPYFPDFAWGSVKENSRIGTSVLQVTARDEDSGRDGEIQY 
S I RDGS GLGRFS I DDE S GVI TAAD I LDRETMGS YWLTVYATDRG WPL YS T I EVY I E VED 
VNDNAPLTSEPLYIILVMDKHPKDVSVIQIQAEDPDSSSNEKLTYRITSGNPQNFLCINI 
KTGL I TTTSRKLDREQQAEHFLE VTVTDGGPS PKQS T I WWVQ VLDENDNKPQFPEKVYQ 
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IKLPERDRKKRGEPIYRAFAFDRDEGPNAEISYSIVDGNDDGKFFIDPKTGIWSSRKQFT 

AG S YD I LT I KAVDNGRPQKS S T ARLH I E W I KKP P P S P I P LT FDE P F YNFT VME S DR VTE I 

VGWSVQPANTPLWFDIVGGNFDSAFDAEKGVGTIVIAKPLDAEQRSIYNMSVEVTDGTN 

VAVTQ VF I KVLDNNDNG P E F S Q PNYDVT I S ED VL PDTE I LQ I E ATDRD E KHKL S YTVH S S 

I D S I S MRKFR I DP S T G VL YT AE RLDHE AQDKH I LN I MVRDQE F P YRRNLARV I VNVE DAN 

DH S P YFTNPLYEAS VFE S AALG S AVLQVTALDKDKGENAEL I YT I E AGNTGNMFKI E PVL 

G 1 1 T I CKE PDMTTMGQFVLS I KVTDQGS PPMSATAIVR I S VTMSDNSHPKFIHKDYQAEV 

NENVDIGTSVILISAISQSTLIYEVKDGDINGIFTINPYSGVITTQKALDYERTSSYQLI 

IQATNMAGMASNATVNIQIVDENDNAPVFLFSQYSGSLSEAAPINSIVRSLDNSPLVIRA 

TDAD S NRNALLVYQ I VES TAKKFFTVDS S TGA I RT I ANLDHET I AHFHFHVHVRDS GS PQ 

LTAESPVEWIEVTDVNDNPPVFTQAVFETILLLPTYVGVEVLKVSATDPDSEVPPELTY 

SLMEGSLDHFLIDSNSGVLTIKNHNLSKDHYMLIVKVSDGKFYSTSM^ 

HFTQSFYSTSISENNTNITKVAIVNAVGNRLNEPLKYSILNPGNKFKIKSTSGVIQTTGV 

PFDREEQELYELWEASRELDHLRVARVWRVNIEDINDNSPVFVGLPYYAAVQVDAEPG 

TLIYQVTAIDKDKGPNGEVTYVLQDDYGHFEINPNSGNVILKEAFNSDLSNIEYGVTILA 

KDGGKPSLSTSVELPITIVNKAMPVFDKPFYTASVNEDIRMNTPILSINATSPEGQGIIY 

I I IDGDPFKQFNIDFDTGVLKWSPLDYEVTSAYKLTIRASDALTGARAEVTVDLLVNDV 

NDNPP I FDQPTYNTTLSEASL I GTPVLQWS I DADSENNKMVH YQ I VQDT YNSTDYFH I D 

SSSGLILTARMLDHELVQHCTLKVRSIDSGFPSLSSEVLVHIYISDVNDNPPVFNQLIYE 

SYVSELAPRGHFVTCVQASDADSSDFDRLEYSILSGNDRTSFLMDSKSGVITLSNHRKQR 

ME PL YS LNVS VS DGLFT S TAQVH I RVLGANL YS P AF S Q S T YVAEVRENV AAGTKV I HVRA 

TDGDPGTYGQI SYAI INDFAKDRFLIDSNGQVITTERLDRENPLEGDVS I FVRALDGGGR 

TTFCTVRVIWDENDNAPQFMTVEYRASVRADVGRGHLVTQVQAIDPDDGANSRITYSLY 

S E AS VS VADLLE I DPDNG WMVTKGNFNQLKNTVL S FFVKAVDGG I PVKH S L I PVY I HVLP 

PETFLPSFTQSQYSFTIAEDTAIGSTVDTLRILPSQNVWFSTVNGERPENNKGGIFVIEQ 

ETGT I KLDKRLDRETS PAFHFKVAAT I PLDKVD I VFTVD VD I KVLDLNDNKP VFETS S YD 

T I IMEGMPVGTKLTQVRAIDMDWGANGQVTYS LHSDS QPEKVME AFN I D SNTGW I STLKD 

LDHETDPTFTFSWASDLGEAFSLSSTALVSVRVTDINDNAPVFAQEVYRGNVKESDPPG 

EWAVLSTWDRDTSDVNRQVSYHITGGNPRGRFALGLVQSEWKVYVKRPLDREEQDIYFL 

N I T ATDGL FVTQAMVE VS VS D VNDNS P VCDQ VAYTALL PED I PSNK I ILKVS AKDADIGS 

NGYIRYSLYGSGNSEFFLDPESGELKTLALLDRERIPVYSLMAKATDGGGRFCQSNIHLI 

LEDVNDNP P VFS SDH YNTC VYENTATKALLTRVQ AVDPD I G I NRKWYS LAD S AGGVFS I 

DSSSGIIILEQPLDREQQSSYNISVUATDQSPGQSLSSLTTVTITVLDINDNPPVFERRD 

YLVTVPEDTSPGTQVLAVFATSKDIGTNAEITYLIRSGNEQGKFKINPKTGGISVSEVLD 

YELCKRFYLWEAKDGGTPALSAVATVNINLTDVNDNPPKFSQDVYSAVISEDALVGDSV 

ILLIAEDVDSQPNGQIHFSIVNGDRDNEFTVDPVLGLVKVKKKLDRERVSGYSLLVQAVD 

SGI PAMSSTATVNIDI SDVNDNSPVFTPANYTAVIQENKPVGTS ILQLWTDRDSFHNGP 

PFSFSILSGNEEEEFVLDPHGILRSAWFQHTESLEYVLCVQAKDSGKPQQVSHTYIRVR 

VIEESTHKPTAIPLEIFIVTMEDDFPGGVIGKIHATDQDMYDVLTFALKSEQKSLFKVNS 

HDGKIIALGGLDSGKWLNVSVSDGRFQVPIDVVVHVEQLVHEMLQNTVTIRFENVSPED 

FVGLHMHGFRRTLRNAVLTQKQD S LR I I S IQ PVAGTNQLDMLF AVEMHS S E F YKP AYL I Q 

KLSNARRHLENIMRISAILEKNCSGLDCQEQHCEQGLSLDSHALMTYSTARISFVCPRFY 

RNVRCTCNGGLCPGSNDPCVEKPCPGDMQCVSYEASRRPFLCQCPPGKLGECSGHTSLSF 

AGNSYIKYRLSENSKEEDFKLALRLRTLQSNGIIMYTRANPCIILKQIVDGKLWFQLDCG 

SGPGILGISGRAVNDGSWHSVFLELNRNFTSLSLDDSYVERRRAPLYFQTLSTESSIYFG 

ALVQADNI RS LTDTRVTQVLS GFQGCLDS VI LNNNELPLQNKRS S FAE WGLTELKLGCV 

LYPDACKRSPCQHGGSCTGLPSGGGYQCTCLSQFTGRNCESEITACFPNPCRNGGSCDPI 

GNTFI CNCKAGLTGVTCEED INECEREECENGGS CVNVFGS FLCNCTPGYVGQYCGLRPV 

WPNI QAGHS YVGKEEL I G I AWLFV I F I LWLF I VFRKKVFRKNYS RNNI TLVQD PATA 

ALLNKS NG I P FRNLRG S GDGRNVYQE VG P PQVP VR PMAYT P C FQ S D S RS NLD K I VDGLGG 

EHQEMTTFHPESPRILTARRGVWCSVAPNLPAVS PCRSDCDS IRKNGWDAGTESNKGSN 

SEVQSLSSFQSDSGDDNAYHWDTSDWMPGARLSDIEEVPNYENQDGGSAHQGSTRELESD 

YYLGGYDIDSEYPPPHEEEFLSQDQLPPPLPEDFPDQYEALPPSQPVSLASTLSPDCRRR 

PQFHPSQYLPPHPFPNETDLVGPPASCEFSTFAVSMNQGTEPTGPADSVSLSLHNSRGTS 

S SDVS ANCGFDDS E VAMS DYES VGELS LAS LHI PFVETQHQTQV 



A search of sequence databases reveals that the N0V9 amino acid sequence has 2201 

of 41 18 amino acid residues (53%) identical to, and 2931 of 41 18 amino acid residues (71%) 

similar to, the 4589 amino acid residue ptnr: SPTREMBL-ACC:Q9WU10 protein from Rattus 

70 



^SL. JL^ n*u?> ^^^^^"i' 1 m W*5iuu,:2Lu* <L.J> |Lx«, 

norvegicus (Rat) (PROTOCADHERIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV9 is expressed in at least breast, prostate, bone marrow, brain, liver, stomach, 
pituitary, cartilage. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV9 polypeptide has homology to the amino acid sequences shown in 
the BLASTP data listed in Table 9C. 



Table 9C. BLAST results for NOV9 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


qi|l8578022|ref 1XP 


similar to FAT 
tumor suppressor 
(Drosophila) 


3446 


3319/3386 
(98%) 


3320/3386 
(98%) 


0.0 


061871 .2 \ 
(XM_061871) 


gi[4885229(ref |NP 0 


FAT tumor 

suppressor 

precursor 


4620 


2381/4620 
(51%) 


3178/4620 
(68%) 


0.0 


05236 .l| 
(NM_005245) 


gi| 13929168|ref |NP 


FAT tumor suppressor 
(Drosophila) homolog 
[Rattus norvegicus] 


4589 


2368/4619 
(51%) 


3170/4619 
(68%) 


0.0 


114007. l| 
(NM 031819) 


gi| 668 8786 | emb ( CAB6 


mouse fat 1 cadherin 
[Mus musculus] 


4587 


2367/4633 
(51%) 


3167/4633 
(68%) 


0.0 


5271. l| (AJ250768 


gi| 13787217|ref |NP 


FAT tumor 
suppressor 2 
precursor [Homo 
sapiens] 


4349 


1854/4434 
(41%) 


2635/4434 
(58%) 


0.0 


001438.1) 
<NM_001447) 



Table 9D lists the domain descriptions from DOMAIN analysis results against NOV9. 
This indicates that the NOV9 sequence has properties similar to those of other proteins known 
to contain this domain. 



Table 9D. Domain Analysis of NOV9 

gni 1 Sma rt | smart00112 , CA, Cadherin repeats.; Cadherins are 
glycoproteins involved in Ca2+-mediated cell -cell adhesion. Cadherin 
domains occur as repeats in the extracellular regions which are 
thought to mediate cell -cell contact when bound to calcium. 

CD-Length = 82 residues, 100.0% aligned 

Score = 97.4 bits (241), Expect = 2e-20 
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Table 9E. Domain Analysis of NOV9 

gnl | Pf am] pf am0002 8 , cadherin, Cadherin domain 

CD-Length = 92 residues, 100.0% aligned 
Score = 94.4 bits (233), Expect = le-19 



NOV9 is a member of the protocadherin family, which in turn is one of the six 
subfamilies of the cadherin superfamily. Cadherins are membrane-associated glycoproteins 
that mediate cell-cell interactions in a calcium-dependent fashion. Protocadherins may act as 
5 cell-cell recognition molecules and may be involved in signal transduction cascades. 

NOV9 has homology to the rat protocadherin that is most related to the Drosophila 
FAT gene. The Drosophila FAT gene shows the presence of multiple characteristic cadherin 
domains and is likely involved in cell guidance, cell repulsion and/or cell adhesion. Recessive 
lethal mutations in the fat locus of Drosophila cause hyperplastic, tumor-like overgrowth of 
10 larval imaginal discs, defects in differentiation and morphogenesis, and death during the pupal 
stage. This indicates that the fat gene has a tumor suppressor function (See Mahoney et al., 
Cell 1991 Nov 29;67(5):853-68). 

The disclosed NOV nucleic acid of the invention encoding a protocadherin-like 
protein includes the nucleic acid whose sequence is provided in Table 9A or a fragment 

1 5 thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 9A while still encoding a protein that 
maintains its protocadherin-like activities and physiological functions, or a fragment of such a 
nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 

20 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 

25 stability of the modified nucleic acid, such that they may be used, for example, as antisense 

binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 

acids, and their complements, up to about 37 percent of the bases may be so changed. 

The disclosed NOV9 protein of the invention includes the protocadherin-like protein 

whose sequence is provided in Table 9B. The invention also includes a mutant or variant 

30 protein any of whose residues may be changed from the corresponding residue shown in Table 
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9B while still encoding a protein that maintains its protocadherin-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 47 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this protocadherin-like 
protein (NOV9) may function as a member of a "protocadherin family". Therefore, the NOV9 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

10 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

15 The NOV9 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the protocadherin-like 
protein (NOV9) may be useful in gene therapy, and the protocadherin-like protein (NOV9) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 

20 example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from hemophilia, hypercoagulation, idiopathic thrombocytopenic purpura, 
autoimmune disease, allergies, immunodeficiencies, transplantation, graft versus host disease, 
endocrine dysfunctions, diabetes, obesity, growth and reproductive disorders, Von Hippel- 
Lindau (VHL) syndrome, cirrhosis, transplantation, hypercalceimia, ulcers, Von Hippel- 

25 Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, 
Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, 
multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, 
anxiety, pain, neurodegeneration,cancer, tissue degeneration, bacterial/viral/parasitic 
infections, or other pathologies or conditions. The NOV9 nucleic acid encoding the 

30 protocadherin-like protein of the invention, or fragments thereof, may further be useful in 

diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are 
to be assessed. 



NOV9 nucleic acids and polypeptides are further useful in the generation of antibodies 
that bind immuno-specifically to the novel NOV9 substances for use in therapeutic or 
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diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies ,; 
section below. The disclosed NOV9 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 

NOV10 

A disclosed NOV 10 nucleic acid of 1071 nucleotides (also referred to as CG57553- 
01) encoding a TO 1C 1.3 -like protein is shown in Table 10A. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
stop codons are in bold letters. 



Table 10A. NOV10 nucleotide sequence (SEQ ID NO:23). 

ATGCATCAGTTTACCATCGTTTCATTCCATCCATCCAGGAGACTACACAGAAACAGAGAG 
GACTATGTGGAAAGAAGTGCTGAGTTTGCAGATGGTTTGCTCTCAAAAGCTTTGAAAGAC 
ATTCAGTCTGGAGCACTGGACATAAATAAAGCAGGCATACTTTATGGCATACCTCAAAAA 
ACTTTACTTCTTCACTTAGAAGCCTTACCAGCAGGGAAGCCTGCATCTTTTAAAAACAAA 
ACTCGAGATTTCCATGATAGTTATTCATATAAGGACAGTAAAGAAACTTGTGCAGTGCTG 
CAAAAAGTAGCCTTGTGGGCAAGAGCTCAAGCAGAGCGCACAGAAAAAAGTAAACTCAAT 
CTACTTGAAACCTCAGAAATAAAATTCCCAACAGCTTCCACTTACCTCCATCAGCTAACT 
CTACAGAAAATGGTCACTCAGTTTAAAGAAAAAAATGAT^AGCCTCCAATATGAAACTTCA 
AATCCTACTGTACAGTTAAAAATTCCTCAGCTACGAGTAAGTTCTGTCTCAAAATCACAA 
CCTGATGGTTCTGGTCTGTTGGATGTTATGTATCAAGTTTCCAAAACCTCTTCAGTCCTA 
GAAGGATCAGCTCTCCAAAAACTGAAAAATATACTCCCTAAACAGAACAAAATAGAATGT 
TCTGGGCCTGTAACTCACTCAAGTGTTGACTCTTACTTTCTACATGGGGACCTCTCTCCT 
TTGTGTCTTAATTCTAAAAATGGAACAGTTGATGGAACCTCTGAAAATACTGAAGATGGA 
TTAGATCGAAAAGACAGTAAGCAGCCCAGGAAAAAACGTGGCCGCTATCGGCAATATGAT 
CATGAAATAATGGAAGAAGCTATTGCAATGGTAATGAGCGGAAAAATGAGTGTTTCCAAA 
GCACAAGGAATTTATGGGGTACCTCACAGCACTTTAGAATACAAGGTAAAAGAAAGATCT 
GGAACACTGAAGACTCCTCCGAAGAAGAAACTACGATTACCAGACACTGGGTTATATAAT 
ATGACAGATTCAGGGACTGGCAGCTGCAAAAACAGCAGCAAGCCTGTGTAG 



In a search of public sequence databases, the NOV 10 nucleic acid sequence, located on 
chromsome 4 has 165 of 267 bases (61%) identical to a gb:GENBANK- 
ID:BGDNA66KD|acc:X87727.1 mRNA from Borrelia garinii (B.garinii p66 gene for 66kDa 
protein). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV10 polypeptide (SEQ ID NO:24) encoded by SEQ ID NO:23 has 
356 amino acid residues and is presented in Table 10B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 10 has no signal peptide and is 
likely to be localized in the nucleus with a certainty of 0.7000. 
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Table 10B. Encoded NOV10 protein sequence (SEQ ID NO:24). 

MHQFTIVSFHPSRRLHRNREDYVERSAEFADGLLSKALKDIQSGALDINKAGILYGIPQK 
TLLLHLEALPAGKPASFKNKTRDFHDSYSYKDSKETCAVLQKVALWARAQAERTEKSKLN 
LLETS E I KFPT AS T YLHQLTLQKMVTQ FKEKNES LQ YETSNPTVQLKI PQLRVS S VS KSQ 
PDGSGLLDVMYQVSKTSSVLEGSALQKLKNILPKQNKIECSGPVTHSSVDSYFLHGDLSP 
LCLNS KNGTVDGTS ENTEDGLDRKDS KQ PRKKRGR YRQ YDHE IMEEAI AMVMS GKMS VS K 
AQG I YGVPHSTLE YKVKER S GTLKTP PKKKLRLPDTGL YNMTDSGTGS CKNS S KPV 



A search of sequence databases reveals that the NOV 10 amino acid sequence has 47 of 
144 amino acid residues (32%) identical to, and 77 of 144 amino acid residues (53%) similar 
5 to, the 185 amino acid residue ptnr:SPTREMBL-ACC:Q22051 protein from Caenorhabditis 
elegans (T01C1.3 PROTEIN). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV 10 is expressed in at least brain and kidney. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
10 but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV 10 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 10C. 



Table 10C. BLAST results for NOV10 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l8559097|ref | XP 
087581. 1| 
(XM_087581) 


protein XP_087581 
[Homo sapiens] 


213 


213/213 
(100%) 


213/213 
(100%) 


e-110 


gi| 16549953 |dbj | BAB 
70892. l| (AK055258) 


unnamed protein 
product [Homo 
sapiens] 


213 


212/213 
(99%) 


213/213 
(100%) 


e-109 


gi| 14 0178 07 |dbj | BAB 
47424 .1| (AB058698) 


KIAA1795 protein 
[Homo sapiens] 


572 


94/242 
(38%) 


125/242 
(50%) 


7e-28 


gi|l474476l|ref |XP 
050988 . 1 | 
(XM 050988) 


KIAA1795 protein 
[Homo sapiens] 


433 


94/242 
(38%) 


125/242 
(50%) 


4e-26 


gi | 18487927 | ref | XP 
081696 . 1 | 
(XM 081696} 


Eip93F 
[Drosophila 
melanogaster] 


1165 


37/76 
(48%) 


46/76 
(59%) 


6e-10 



15 

The T01C1 .3-like protein described in this invention is similar to the C. elegans 
protein T01C1 .3, a novel protein. The T01C1 .3-like protein appears to be expressed in kidney 
and brain and may therefore play a role in the development of cancer, neurological diseases, or 
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metabolic disorders. The T01C1 .3-like gene maps to human chromosome 4 and has no 
identifiable domains. 

The disclosed NOV 10 nucleic acid of the invention encoding a T01 CI. 3-like protein 
includes the nucleic acid whose sequence is provided in Table 1 OA or a fragment thereof. The 
5 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 10A while still encoding a protein that maintains 
its T01C1. 3-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
10 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
1 5 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 39 percent of the 
bases may be so changed. 

The disclosed NOV 10 protein of the invention includes the TO 1C 1.3 -like protein 
whose sequence is provided in Table 10B. The invention also includes a mutant or variant 
20 protein any of whose residues may be changed from the corresponding residue shown in Table 
10B while still encoding a protein that maintains its T01C1 .3-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 68 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
25 (Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this T01 CI. 3-like 
protein (NOV 10) may function as a member of a "T01C1.3 family". Therefore, the NOV 10 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
30 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 
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The NOV 10 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the T01 CI .3-like protein 
(NOV 10) may be useful in gene therapy, and the TO 1 CI. 3-like protein (NOV 10) may be 
5 useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from cancer, trauma, bacterial and viral infections, in vitro and in vivo regeneration, Von 
Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch- 

10 Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 

disorders, addiction, anxiety, pain, neurodegeneration, diabetes, autoimmune disease, renal 
artery stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney disease, systemic 
lupus erythematosus, renal tubular acidosis, IgA nephropathy, and hypercalceimia, or other 
pathologies or conditions. The NOV 10 nucleic acid encoding the TO 1 CI. 3-like protein of the 

1 5 invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV 10 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 10 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

20 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV10 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

25 disorders. 

NOV 11 

NOV1 1 includes three alpha-macroglobulin-like proteins disclosed below. The 
disclosed sequences have been named NOV 1 la, NOV1 lb and NOV1 lc. 
NOV 11a 

30 A disclosed NOV1 1 a nucleic acid of 61 95 nucleotides (also referred to as 

CG57488_01) encoding a alpha-macroglobulin-like protein is shown in Table 1 1 A. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
underlined, and the start and stop codons are in bold letters. 
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Table 11A. NOVlla nucleotide sequence (SEQ ID NO:25). 

ATGAGCGGCGCCCTGCTCTGGCCGTTGCTCCCGCTCCTGCTCCTGCTGCTGTCGGCGCGG 
GACGGCGTGCGCGCCGCGCAGCCTCAGGCCCCGGGTTACTTGATTGCAGCTCCCTCTGTT 
TTTCGCGCGGGCGTGGAGGAAGTCATCAGCGTGACCATCTTTAACTCTCCAAGGGAAGTC 
ACGGTCCAGGCTCAGCTGGTGGCCCAGGGTGAGCCGGTGGTGCAGAGCCAGGGAGCCATC 
CTGGATAAAGGGACAATCAAACTCAAGCATACGGTCCTCAGCACCTCCGGTATCTCCCTC 
CTGCCCATCCTTGCCCTGCTCTTGGGTGGCCGGGACCTTTCCTCCCTCTTCAGCCTCTGG 
CCAGTGTTGAGATATTTCCAGAAACAGGGCCAGGTGCCCACGGGCCTCCGGGGCCAAGCG 
CTTCTGAAAGTGTGGGGCCGCGGCTGGCAGGCGGAGGAGGGGCCCCTCTTTCACAACCAG 
ACCTCGGTGACCGTGGACGGCCGGGGCGCTTCTGTATTCATCCAGACGGACAAGCCTGTG 
TACAGACCCCAGCACCGAGTGCTCATAAGCATCTTCACCGTCTCTCCAAATCTGAGGCCT 
GTCAACGAGAAGCTGGAAGCCTACATCCTGGACCCCCGAGGCTCTCGGATGATAGAGTGG 
AGACACTTAAAGCCGTTCTGCTGCGGCATCACCAACATGAGCTTCCCCTTGTCCGACCAG 
CCTGTGTTGGGAGAATGGTTCATTTTTGTTGAAATGCAAGGCCACGCGTACAACAAGTCT 
TTTGAAGTTCAGAAGTATGTGTTGCCCAAGTTTGAGCTTCTGATTGACCCGCCCCGGTAT 
ATCCAAGACCTGGACGCCTGTGAGACAGGCACTGTGCGGGCCAGGTATACCTTTGGGAAA 
CCTGTGGCTGGTGCCTTAATGATCAACATGACTGTTAATGGTGTAGGGTACTACAGCCAC 
GAGGTGGGACGCCCTGTCCTCAGAACAACCAAGATCCTCGGCTCCCGGGACTTCGACATC 
TGCGTGAGGGACATGATCCCAGCGGACGTCCCTGAGCACTTCCGGGGCAGGGTCAGCATC 
TGGGCCATGGTGACCAGTGTGGACGGGAGCCAGCAGGTCGCGTTCGATGACTCCACCCCC 
GTGCAGAGGCAGCTGGTGGACATCCGGTACTCCAAGGACACGAGGAAGCAGTTCAAGCCG 
GGCCTGGCCTACGTGGGGAAGGTGGAGCTATCCTACCCCGATGGCAGCCCAGCTGAGGGG 
GTGACGGTCCAGATTAAGGCAGAGCTGACACCAAAGGATAACATCTACACCAGTGAAGTT 
GTGTCCCAGCGTGGACTAGTGGGGTTTGAAATCCCCTCCATCCCCACGTCAGCCCAGCAC 
GTGTGGCTGGAGACCAAGGTGATGGCACTGAACGGGAAGCCCGTGGGGGCTCAGTACCTG 
CCCAGCTACCTCTCCCTCGGCAGCTGGTACTCCCCCAGCCAGTGCTACCTGCAGCTGCAG 
CCACCCTCCCACCCACTGCAGGTTGGGGAAGAAGCCTATTTTTCTGTGAAGTCCACATGT 
CCCTGCAACTTTACCCTGTACTACGAGGTGGCTGCACGGGGCAATATTGTGCTATCGGGC 
CAGCAGCCTGCCCACACCACCCAGCAGCGAAGCAAGCGGGCGGCCCCTGCCCTGGAGAAA 
CCGATTCGTTTAACACACCTTTCTGAGACAGAGCCCCCACCAGCCCCAGAAGCTGAGGTC 
GACGTGTGTGTGACCTCTCTTCATCTGGCCGTGACCCCCAGCATGGTCCCCCTTGGTCGC 
CTGCTGGTCTTCTACGTCAGGGAGAATGGAGAAGGGGTCGCCGACAGCCTTCAGTTTGCA 
GTCGAGACCTTCTTCGAAAACCAGGTTTCAGTGACGTATTCAGCAAATGAGACCCAACCT 
GGGGAGGTTGTCGACCTGCGGATCAGGGCTGCAAGGGGCAGCTGTGTGTGCGTCGCCGCA 
GTTGATAAGAGTGTCTACCTGCTCAGGTCTGGGTTCCGGCTGACTCCTGCCCAGGTTTTC 
CAGGAACTGGAAGATTATGATGTTTCTGATTCCTTTGGCGTGTCCAGGGAGGATGGTCCT 
TTTTGGTGGGCTGGGCTGACGGCACAACGACGCCGGCGCTCCTCTGTCTTCCCGTGGCCT 
TGGGGCATCACCAAGGACTCTGGGTTTGCCTTCACCGAAACGGGACTGGTGGTGATGACC 
GACCGAGTGAGCCTGAACCACCGGCAGGACGGTGGCCTCTACACCGATGAGGCTGTCCCC 
GCTTTCCAGCCCCACACAGGGAGCCTGGTGGCAGTGGCTCCTTCCAGGCACCCCCCCAGA 
ACAGAGAAGAGAAAAAGGACTTTCTTCCCCGAAACATGGATTTGGCATTGTCTCAACATC 
AGTGACCCATCTGGTGAGGGGACACTCAGTGTGAAGGTCCCGGACTCCATCACCAGCTGG 
GTGGGTGAGGCCGTGGCCCTGTCCACCTCTCAGGGCTTAGGCATCGCCGAGCCCTCCCTG 
CTGAAGACCTTCAAGCCCTTCTTCGTGGACTTCATGCTCCCCGCTCTCATCATCCGTGGG 
GAGCAGGTCAAGATCCCGCTCAGTGTCTACAACTACATGGGCACCTGCGCTGAGGTGTAC 
ATGAAGCTCTCGGTTCCCAAGGGCATCCAGTTTGTTGGGCATCCTGGCAAACGCCATGTG 
ACCAAGAAGATGTGTGTGGCCCCCGGGGAGGCTGAGCCCATCTGGGTCGTTCTGTCCTTC 
AGCGACCTGGGACTCAACAACATCACGGCCAAAGCCCTTGCTTACGGAGACACAAATTGC 
TGCCGGGATGGGAGGTCCAGCAAACACCCTGAGGAGAATCACGCCGACAGGAGGGTCCCC 
ATCGGGGTGGATCACGTCAGGCGCAGTGTGATGGTTGAGGCGGAAGGAGTCCCCCGGGCG 
TACACCTACAGCGCATTCTTCTGTCCCAGTGAGAGAGTCCACATCTCCACCCCCAACAAG 
TATGAGTTCCAGTATGTGCAGCGGCCACTGCGCCTCACCCGCTTTGATGTGGCTGTGCGA 
GCTCACAATGATGCCCGTGTGGCCTTGTCTTCTGGGCCCCAGGACACAGCAGGCATGATC 
GAGATCGTCCTGGGGGGGCATCAGAACACCAGGTCATGGATCTCCACCAGCAAGATGGGA 
GAGCCCGTGGCCAGTGCACACACGGCCAAGATCCTCTCCTGGGATGAATTCAGAACATTC 
TGGATCAGCTGGCGTGGTGGCCTTATCCAGGTTGGCCATGGTCCAGAGCCATCCAATGAG 
TCTGTCATTGTGGCCTGGACCCTCCCGAGGCCACCAGAGGTCCAGTTCATTGGCTTTTCC 
ACCGGCTGGGGCTCCATGGGTGAATTCCGAATCTGGAGGAAGATGGAGGTGGACGAGAGC 
TACAGCGAGGCCTTCACCCTGGGGGTCCCACACGGCGCCATCCCTGGGTCTGAGCGAGCC 
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ACCGCCTCCATCATCGGGGACGTCATGGGGCCAACCCTGAACCACCTCAACAACCTCCTG 
CGGCTGCCGTTTGGCTGTGGAGAGCAGAACATGATCCACTTTGCACCCAACGTCTTTGTC 
TTGAAGTATCTTCAGAAAACCCAGCAGCTCAGCCCTGAGGTGGAGAGAGAGACCACCGAC 
TACCTAGTACAAGGCTACCAGCGCCAGCTGACCTACAAGCGCCAGGATGGCTCCTACAGC 
GCGTTTGGGGAGCGGGACGCATCGGGGAGCATGTGGCTCACAGCCTTTGTCCTGAAGTCC 
TTCGCACAGGCTCGCAGCTTTATCTTCGTGGACCCCCGGGAGCTGGCTGCCGCCAAGAGC 
TGGATCATCCAGCAGCAGCAGGCCGATGGCTCCTTCCTGGCCGTGGGCAGGGTCCTGAAC 
AAGGACATCCAGGGTGGGATCCACGGCATTGTCCCGCTGACAGCCTACGTGGTGGTTGCT 
CTCCTGGAAACAGGCACAGCCTCAGAGGAGGAGAGAGGCTCCACTGACAAAGCGAGGCAC 
TTCCTGGAGTCTGCTGCGCCCCTGGCCATGGACCCTTATAGCTGTGCCCTGACTACCTAC 
GCGCTGACCCTGCTCCGCAGCCCGGCAGCCCCTGAGGCACTGCGCAAGCTCCGTAGCCTG 
GCCATCATGCGAGATGGGGTCACCCACTGGAGCCTGTCAAATTCCTGGGACGTGGACAAG 
GGCACATTCTTGAGCTTCAGTGACAGGGTCTCTCAGTCAGTGGTCTCGGCCGAGGTGGAA 
ATGACAGCCTACGCCCTTCTGACCTACACTCTGCTGGGTGACGTGGCTGCCGCCCTGCCT 
GTGGTGAAGTGGCTGTCCCAGCAGCGAAATGCACTTGGGGGCTTCTCCTCCACTCAGGAC 
ACCTGCGTGGCTCTGCAGGCCTTGGCTGAATATGCCATCTTGTCCTATGCTGGAGGCATC 
AACCTCACTGTCTCCCTGGCCTCCACCAACCTGGACTACCAGGAAACCTTCGAGCTGCAC 
AGGACCAACCAGAAGGTTCTGCAGACAGCAGCGATCCCCAGCCTCCCCACGGGGCTGTTT 
GTGAGTGCCAAGGGGGACGGCTGCTGCCTGATGCAGATTGATGTCACCTACAATGTGCCT 
GACCCGGTGGCCAAGCCAGCTTTCCAGCTGCTCGTAAGCCTCCAGGAGCCTGAGGCCCAG 
GGACGCCCGCCCCCCATGCCTGCCTCCGCAGCTGAGGGTTCCCGAGGAGACTGGCCCCCA 
GCTGACGATGATGACCCAGCGGCCGATCAGCATCACCAGGAATACAAGGTGATGCTGGAG 
GTGTGCACCAGGTGGCTGCATGCAGGGTCTTCCAATATGGCTGTCCTGGAGGTGCCCCTG 
CTGTCAGGCTTCCGGGCAGACATCGAGAGCCTGGAGCAGCTGCTCCTTGACAAGCACATG 
GGGATGAAGAGGTATGAAGTGGCTGGACGCCGAGTGCTCTTCTACTTTGATGAGATCCCC 
AGCCGGTGCCTGACGTGCGTGCGGTTCCGTGCTCTCCGGGAGTGCGTGGTGGGCAGGACG 
TCGGCGCTGCCAGTCTCCGTGTACGACTACTACGAACCCGCCTTCGAGGCCACTCGCTTC 
TACAACGTCAGCACGCACAGCCCACTCGCCCGGGAACTGTGCGCCGGACCCGCGTGCAAC 
GAAGTGGAGCGCGCCCCTGCCCGGGGCCCGGGCTGGTTCCCCGGCGAGTCGGGCCCTGCC 
GTGGCCCCTGAGGAGGGGGCGGCGATCGCGCGATGCGGCTGCGACCACGACTGCGGCGCC 
CAGGGGAACCCGGTGTGCGGCTCCGACGGGGTGGTCTACGCCAGCGCCTGCCGCCTGCGG 
GAGGCCGCCTGCCGCCAGGCCGCGCCCCTGGAGCCCGCGCCTCCCAGCTGCTGCGCCCTC 
GAGCAGCGGCTGCCGGCCTCGTCGTCCTCCACCTACGGGGATGACCTGGCTTCTGTGGCC 
CCGGGGCCTTTACAGCAGGACGTGAAGCTGAATGGAGCCGGCCTTGAGGTGGAGGACTCA 
GACCCTGAGCCTGAAGGGGAGGCGGAGGACAGGGTCACAGCCGGGCCTCGGCCTCCTGTG 
AGCAGCGGGAACCTGGAAAGCAGCACCCAGAGCGCCAGCCCGTTCCACAGATGGGGCCAG 
ACTCCGGCCCCTCAGAGACATAGTGGCCGGGTGGTGGGGGCCCACAGGCCAGGGCTTCTG 
AGCCCTGTCTTCGTCTACAGCCCAGCCTTTCAGAGTGGTGGGGAGGAGGGTTTATGGATG 
TCAAACACCTGCACCTTGAGATAA. TCCTACAACCACATGCAGTTGTGGGACCGCAGTTTG 
GTCCTGGGGACCATTCATACCCACACACCCAGCTTGTGCCTGTGGTTAACATCTCAGAAA 
ACTCTGGTAAATGATCACTCCAGGATATTGACACGAATACACGTTACTGATCT TACTCAC 
ATGTTCTGGGGTGCACATGAACTTTGTGTGTGCATGTGTGTGTGTGTGCATGTGTGTGTC 
CCGGGCACCTGACACCCCCAGCCCAGGGCTGCCCAAAGTTGGGCTGATCAGAG ACATAGA 
CCCAATGAGGAGCCCAACAGTGGCCCTCCAACCCTCTGCCTTGCCCCCATAGTTCATGCC 
CCAGTGGTCTTTGAAACTGCCCTGTGCCACTCCCTGGAGTGAGCAGCGGTGTCTCTGTGT 
GTGTGTGTGTCTGTG " " 



In a search of public sequence databases, the N0V1 la nucleic acid sequence, located 
on chromsome 19 has 5574 of 5594 bases (99%) identical to a gb:GENBANK- 
ID:AB033109|acc:AB033 109.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
5 KIAA1283 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV1 la polypeptide (SEQ ID NO:26) encoded by SEQ ID NO:25 has 
1927 amino acid residues and is presented in Table 1 IB using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV1 la has a signal peptide and is 
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likely to be localized at the plasma membrane with a certainty of 0.6400. The most likely 
cleavage site for a NOV1 la peptide is between amino acids 25 and 26. 



Table 11B. Encoded NOVlla protein sequence (SEQ ID NO:26). 

MSGALLWPLLPLLLLLLSARDGVRAAQPQAPGYLIAAPSVFRAGVEEVISVTIFNSPREV 
TVQAQL VAQGE P WQ S QG A I LDKGT I KLKHT VL STSGISLLPI L ALLLGGRDL S S L FS LW 
PVLRYFQKQGQVPTGLRGQALLKVWGRGWQAEEGPLFHNQTSVTVDGRGASVFIQTDKPV 
YRPQHRVL I S I FTVSPNLRPVNEKLEAY I LDPRG SRMI EWRHLKPFCCG I TNMS FPLSDQ 
PVLGEWFIFVEMQGHAYNKSFEVQKYVLPKFELLIDPPRYIQDLDACETGTVRARYTFGK 
PVAGALMINMTVNGVGYYSHEVGRPVLRTTKILGSRDFDI CVRDMI PADVPEHFRGRVSI 
WAMVTSVDGSQQVAFDDSTPVQRQLVDIRYSKDTRKQFKPGLAYVGKVELSYPDGSPAEG 
VTVQ I KAELTPKDNI YTS EWS QRGLVGFE I P S I PTS AQHVWLETKVMALNGKPVGAQ YL 
PSYLSLGSWYSPSQCYLQLQPPSHPLQVGEEAYFSVKSTCPCNFTLYYEVAARGNIVLSG 
QQPAHTTQQRS KRAAPALEKP I RLTHLS ETE P P P APEAE VDVCVTS LHLAVTPS MVPLGR 
LLVFYVRENGEGVADSLQFAVETFFENQVSVTYSANETQPGEWDLRIRAARGSCVCVAA 
VDKSVYLLRSGFRLTPAQVFQELEDYDVSDSFGVSREDGPFWWAGLTAQRRRRSSVFPWP 
WGITKDSGFAFTETGLWMTDRVSLNHRQDGGLYTDEAVPAFQPHTGSLVAVAPSRHPPR 
TEKRKRTFFPETWIWHCLNISDPSGEGTLSVKVPDSITSWVGEAVALSTSQGLGIAEPSL 
LKTFKPFFVDFMLPALIIRGEQVKIPLSVYNYMGTCAEVYMKLSVPKGIQFVGHPGKRHV 
TKKMCVAPGEAEPIWVVLSFSDLGLNNITAKALAYGDTNCCRDGRSSKHPEENHADRRVP 
IGVDHVRRSVMVEAEGVPRAYTYSAFFCPSERVHISTPNKYEFQYVQRPLRLTRFDVAVR 
AHND ARVALS S G PQDTAGMI E I VLGGHQNTRS W I S TS KMGE PVAS AHTAKI LS WDE FRT F 
WISWRGGLIQVGHGPEPSNESVIVAWTLPRPPEVQFIGFSTGWGSMGEFRIWRKMEVDES 
YSEAFTLGVPHGAIPGSERATASIIGDVMGPTLNHLNNLLRLPFGCGEQNMIHFAPNVFV 
LKYLQKTQQLSPEVERETTDYLVQGYQRQLTYKRQDGSYSAFGERDASGSMWLTAFVLKS 
FAQARSFIFVDPRELAAAKSWIIQQQQADGSFLAVGRVLNKDIQGGIHGIVPLTAYVWA 
LLETGTASEEERGSTDKARHFLESAAPLAMDPYSC7UL»TTYALTLLRSPAAPEALRKLRSL 
AIMRDGVTHWSLSNSWDVDKGTFLSFSDRVSQSWSAEVEMTAYALLTYTLLGDVAAALP 
WKWL S QQRNALGG F S S TQDTC VALQ AL AE YA I L S YAGG INLT VS IjAS TNLD YQET FELH 
RTNQKVLQTAAI PS LPTGLFVS AKGDGCCLMQ IDVTYNVPDPVAKPAFQLLVS LQE PEAQ 
GRPPPMPASAAEGSRGDWPPADDDDPAADQHHQEYKVMLEVCTRWLHAGSSNMAVLEVPL 
LSGFRAD I E SLEQLLLDKHMGMKRYEVAGRRVLFYFDE I PSRCLTCVRFRALREC WGRT 
SALPVSVYDYYEPAFEATRFYNVSTHSPLARELCAGPACNEVERAPARGPGWFPGESGPA 
VAPEEGAAIARCGCDHDCGAQGNPVCGSDGWYASACRLREAACRQAAPLEPAPPSCCAL 
EQRLPASSSSTYGDDLASVAPGPLQQDVKLNGAGLEVEDSDPEPEGEAEDRVTAGPRPPV 
SSGNLESSTQSASPFHRWGQTPAPQRHSGRWGAHRPGLLSPVFVYSPAFQSGGEEGLWM 
SNTCTLR 



A search of sequence databases reveals that the NOV1 la amino acid sequence has 
5 1794 of 1797 amino acid residues (99%) identical to, and 1796 of 1797 amino acid residues 
(99%) similar to, the 1884 amino acid residue ptnr:SPTREMBL-ACC:Q9ULD7 protein from 
Homo sapiens (Human) (KIAA1283 PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV1 la is expressed in at least Adrenal Gland/Suprarenal gland, Bone Marrow, 
10 Brain, Heart, Kidney, Lung, Lymphoid tissue, Mammary gland/Breast, Pituitary Gland, 

Placenta, Prostate, Retina, Salivary Glands, Spleen, Thalamus, Thyroid. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 



80 



r» C* ICS*. «8 «S* .!S3l 5 

fU 1 ?u*3? n»& ^iii< "^>" '.w a? .a«^ 3U«u x 



NOV lib 

A disclosed NOV1 lb nucleic acid of 6069 nucleotides (also referred to as 
CG57488J32) encoding a alpha-macroglobulin-like protein is shown in Table 1 1C. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
5 underlined, and the start and stop codons are in bold letters. 



Table 11C. NOVllb nucleotide sequence (SEQ ID NO:27). 

ATGAGCGGCGCCCTGCTCTGGCCGTTGCTCCCGCTCCTGCTCCTGCTGCTGTCGGCGCGG 
GACGGCGTGCGCGCCGCGCAGCCTCAGGCCCCGGGTTACTTGATTGCAGCTCCCTCTGTT 
TTTCGCGCGGGCGTGGAGGAAGTCATCAGCGTGACCATCTTTAACTCTCCAAGGGAAGTC 
ACGGTCCAGGCTCAGCTGGTGGCCCAGGGTGAGCCGGTGGTGCAGAGCCAGGGAGCCATC 
CTGGATAAAGGGACAATCAAACTCAAGGTGCCCACGGGCCTCCGGGGCCAAGCGCTTCTG 
AAAGTGTGGGGCCGCGGCTGGCAGGCGGAGGAGGGGCCCCTCTTTCACAACCAGACCTCG 
GTGACCGTGGACGGCCGGGGCGCTTCTGTATTCATCCAGACGGACAAGCCTGTGTACAGA 
CCCCAGCACCGAGTGCTCATAAGCATCTTCACCGTCTCTCCAAATCTGAGGCCTGTCAAC 
GAGAAGCTGGAAGCCTACATCCTGGACCCCCGAGGCTCTCGGATGATAGAGTGGAGACAC 
TTAAAGCCGTTCTGCTGCGGCATCACCAACATGAGCTTCCCCTTGTCCGACCAGCCTGTG 
TTGGGAGAATGGTTCATTTTTGTTGAAATGCAAGGCCACGCGTACAACAAGTCTTTTGAA 
GTTCAGAAGTATGTGTTGCCCAAGTTTGAGCTTCTGATTGACCCGCCCCGGTATATCCAA 
GACCTGGACGCCTGTGAGACAGGCACTGTGCGGGCCAGGTATACCTTTGGGAAACCTGTG 
GCTGGTGCCTTAATGATCAACATGACTGTTAATGGTGTAGGGTACTACAGCCACGAGGTG 
GGACGCCCTGTCCTCAGAACAACCAAGATCCTCGGCTCCCGGGACTTCGACATCTGCGTG 
AGGGACATGATCCCAGCGGACGTCCCTGAGCACTTCCGGGGCAGGGTCAGCATCTGGGCC 
ATGGTGACCAGTGTGGACGGGAGCCAGCAGGTCGCGTTCGATGACTCCACCCCCGTGCAG 
AGGCAGCTGGTGGACATCCGGTACTCCAAGGACACGAGGAAGCAGTTCAAGCCGGGCCTG 
GCCTACGTGGGGAAGGTGGAGCTATCCTACCCCGATGGCAGCCCAGCTGAGGGGGTGACG 
GTCCAGATTAAGGCAGAGCTGACACCAAAGGATAACATCTACACCAGTGAAGTTGTGTCC 
CAGCGTGGACTAGTGGGGTTTGAAATCCCCTCCATCCCCACGTCAGCCCAGCACGTGTGG 
CTGGAGACCAAGGTGATGGCACTGAACGGGAAGCCCGTGGGGGCTCAGTACCTGCCCAGC 
TACCTCTCCCTCGGCAGCTGGTACTCCCCCAGCCAGTGCTACCTGCAGCTGCAGCCACCC 
TCCCACCCACTGCAGGTTGGGGAAGAAGCCTATTTTTCTGTGAAGTCCACATGTCCCTGC 
AACTTTACCCTGTACTACGAGGTGGCTGCACGGGGCAATATTGTGCTATCGGGCCAGCAG 
CCTGCCCACACCACCCAGCAGCGAAGCAAGCGGGCGGCCCCTGCCCTGGAGAAACCGATT 
CGTTTAACACACCTTTCTGAGACAGAGCCCCCACCAGCCCCAGAAGCTGAGGTCGACGTG 
TGTGTGACCTCTCTTCATCTGGCCGTGACCCCCAGCATGGTCCCCCTTGGTCGCCTGCTG 
GTCTTCTACGTCAGGGAGAATGGAGAAGGGGTCGCCGACAGCCTTCAGTTTGCAGTCGAG 
ACCTTCTTCGAAAACCAGGTTTCAGTGACGTATTCAGCAAATGAGACCCAACCTGGGGAG 
GTTGTCGACCTGCGGATCAGGGCTGCAAGGGGCAGCTGTGTGTGCGTCGCCGCAGTTGAT 
AAGAGTGTCTACCTGCTCAGGTCTGGGTTCCGGCTGACTCCTGCCCAGGTTTTCCAGGAA 
CTGGAAGATTATGATGTTTCTGATTCCTTTGGCGTGTCCAGGGAGGATGGTCCTTTTTGG 
TGGGCTGGGCTGACGGCACAACGACGCCGGCGCTCCTCTGTCTTCCCGTGGCCTTGGGGC 
ATCACCAAGGACTCTGGGTTTGCCTTCACCGAAACGGGACTGGTGGTGATGACCGACCGA 
GTGAGCCTGAACCACCGGCAGGACGGTGGCCTCTACACCGATGAGGCTGTCCCCGCTTTC 
CAGCCCCACACAGGGAGCCTGGTGGCAGTGGCTCCTTCCAGGCACCCCCCCAGAACAGAG 
AAGAGAAAAAGGACTTTCTTCCCCGAAACATGGATTTGGCATTGTCTCAACATCAGTGAC 
CCATCTGGTGAGGGGACACTCAGTGTGAAGGTCCCGGACTCCATCACCAGCTGGGTGGGT 
GAGGCCGTGGCCCTGTCCACCTCTCAGGGCTTAGGCATCGCCGAGCCCTCCCTGCTGAAG 
ACCTTCAAGCCCTTCTTCGTGGACTTCATGCTCCCCGCTCTCATCATCCGTGGGGAGCAG 
GTCAAGATCCCGCTCAGTGTCTACAACTACATGGGCACCTGCGCTGAGGTGTACATGAAG 
CTCTCGGTTCCCAAGGGCATCCAGTTTGTTGGGCATCCTGGCAAACGCCATGTGACCAAG 
AAGATGTGTGTGGCCCCCGGGGAGGCTGAGCCCATCTGGGTCGTTCTGTCCTTCAGCGAC 
CTGGGACTCAACAACATCACGGCCAAAGCCCTTGCTTACGGAGACACAAATTGCTGCCGG 
GATGGGAGGTC C AGC AAACACC CTGAGGAGAATC ACGC CGAC AGGAGGGTC CCCATCGGG 
GTGGATCACGTCAGGCGCAGTGTGATGGTTGAGGCGGAAGGAGTCCCCCGGGCGTACACC 
TACAGCGCATTCTTCTGTCCCAGTGAGAGAGTCCACATCTCCACCCCCAACAAGTATGAG 
TTCCAGTATGTGCAGCGGCCACTGCGCCTCACCCGCTTTGATGTGGCTGTGCGAGCTCAC 
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AATGATGCCCGTGTGGCCTTGTCTTCTGGGCCCCAGGACACAGCAGGCATGATCGAGATC 
GTCCTGGGGGGGCATCAGAACACCAGGTCATGGATCTCCACCAGCAAGATGGGAGAGCCC 
GTGGCCAGTGCACACACGGCCAAGATCCTCTCCTGGGATGAATTCAGAACATTCTGGATC 
AGCTGGCGTGGTGGCCTTATCCAGGTTGGCCATGGTCCAGAGCCATCCAATGAGTCTGTC 
ATTGTGGCCTGGACCCTCCCGAGGCCACCAGAGGTCCAGTTCATTGGCTTTTCCACCGGC 
TGGGGCTCCATGGGTGAATTCCGAATCTGGAGGAAGATGGAGGTGGACGAGAGCTACAGC 
GAGGCCTTCACCCTGGGGGTCCCACACGGCGCCATCCCTGGGTCTGAGCGAGCCACCGCC 
TCCATCATCGGGGACGTCATGGGGCCAACCCTGAACCACCTCAACAACCTCCTGCGGCTG 
CCGTTTGGCTGTGGAGAGCAGAACATGATCCACTTTGCACCCAACGTCTTTGTCTTGAAG 
TATCTTCAGAAAACCCAGCAG CTCAGCCCTGAGGTGGAGAGAGAGAC CACCGACTACCTA 
GTACAAGGCTACCAGCGCCAGCTGACCTACAAGCGCCAGGATGGCTCCTACAGCGCGTTT 
GGGGAGCGGGACGCATCGGGGAGCATGTGGCTCACAGCCTTTGTCCTGAAGTCCTTCGCA 
CAGGCTCGCAGCTTTATCTTCGTGGACCCCCGGGAGCTGGCTGCCGCCAAGAGCTGGATC 
ATCCAGCAGCAGCAGGCCGATGGCTCCTTCCTGGCCGTGGGCAGGGTCCTGAACAAGGAC 
ATCCAGGGTGGGATCCACGGCATTGTCCCGCTGACAGCCTACGTGGTGGTTGCTCTCCTG 
GAAACAGGCACAGCCTCAGAGGAGGAGAGAGGCTCCACTGACAAAGCGAGGCACTTCCTG 
GAGTCTGCTGCGCCCCTGGCCATGGACCCTTATAGCTGTGCCCTGACTACCTACGCGCTG 
ACCCTGCTCCGCAGCCCGGCAGCCCCTGAGGCACTGCGCAAGCTCCGTAGCCTGGCCATC 
ATGCGAGATGGGGTCACCCACTGGAGCCTGTCAAATTCCTGGGACGTGGACAAGGGCACA 
TTCTTGAGCTTCAGTGACAGGGTCTCTCAGTCAGTGGTCTCGGCCGAGGTGGAAATGACA 
GCCTACGCCCTTCTGACCTACACTCTGCTGGGTGACGTGGCTGCCGCCCTGCCTGTGGTG 
AAGTGGCTGTCCCAGCAGCGAAATGCACTTGGGGGCTTCTCCTCCACTCAGGACACCTGC 
GTGGCTCTGCAGGCCTTGGCTGAATATGCCATCTTGTCCTATGCTGGAGGCATCAACCTC 
ACTGTCTCCCTGGCCTCCACCAACCTGGACTACCAGGAAACCTTCGAGCTGCACAGGACC 
AACCAGAAGGTTCTGCAGACAGCAGCGATCCCCAGCCTCCCCACGGGGCTGTTTGTGAGT 
GCCAAGGGGGACGGCTGCTGCCTGATGCAGATTGATGTCACCTACAATGTGCCTGACCCG 
GTGGCCAAGCCAGCTTTCCAGCTGCTCGTAAGCCTCCAGGAGCCTGAGGCCCAGGGACGC 
CCGCCCCCCATGCCTGCCTCCGCAGCTGAGGGTTCCCGAGGAGACTGGCCCCCAGCTGAC 
GATGATGACCCAGCGGCCGATCAGCATCACCAGGAATACAAGGTGATGCTGGAGGTGTGC 
ACCAGGTGGCTGCATGCAGGGTCTTCCAATATGGCTGTCCTGGAGGTGCCCCTGCTGTCA 
GGCTTCCGGGCAGACATCGAGAGCCTGGAGCAGCTGCTCCTTGACAAGCACATGGGGATG 
AAGAGGTATGAAGTGGCTGGACGCCGAGTGCTCTTCTACTTTGATGAGATCCCCAGCCGG 
TGCCTGACGTGCGTGCGGTTCCGTGCTCTCCGGGAGTGCGTGGTGGGCAGGACGTCGGCG 
CTGCCAGTCTCCGTGTACGACTACTACGAACCCGCCTTCGAGGCCACTCGCTTCTACAAC 
GTCAGCACGCACAGCCCACTCGCCCGGGAACTGTGCGCCGGACCCGCGTGCAACGAAGTG 
GAGCGCGCCCCTGCCCGGGGCCCGGGCTGGTTCCCCGGCGAGTCGGGCCCTGCCGTGGCC 
CCTGAGGAGGGGGCGGCGATCGCGCGATGCGGCTGCGACCACGACTGCGGCGCCCAGGGG 
AACCCGGTGTGCGGCTCCGACGGGGTGGTCTACGCCAGCGCCTGCCGCCTGCGGGAGGCC 
GCCTGCCGCCAGGCCGCGCCCCTGGAGCCCGCGCCTCCCAGCTGCTGCGCCCTCGAGCAG 
CGGCTGCCGGCCTCGTCGTCCTCCACCTACGGGGATGACCTGGCTTCTGTGGCCCCGGGG 
CCTTTACAGCAGGACGTGAAGCTGAATGGAGCCGGCCTTGAGGTGGAGGACTCAGACCCT 
GAGCCTGAAGGGGAGGCGGAGGACAGGGTCACAGCCGGGCCTCGGCCTCCTGTGAGCAGC 
GGGAACCTGGAAAGCAGCACCCAGAGCGCCAGCCCGTTCCACAGATGGGGCCAGACTCCG 
GCCCCTCAGAGACATAGTGGCCGGGTGGTGGGGGCCCACAGGCCAGGGCTTCTGAGCCCT 
GTCTTCGTCTACAGCCCAGCCTTTCAGAGTGGTGGGGAGGAGGGTTTATGGATGTCAAAC 
ACCTGCACCTTGAGATA ATCCTACAACCACATGCAGTTGTGGGACCGCAGTTTGGTCCTG 
GGGACCATTCATACCCACACACCCAGCTTGTGCCTGTGGTTAACATCTCAGAAAACTCTG 
GTAAATGATCACTCCAGGATATTGACACGAATACACGTTACTGATCTTACTCACATGTTC 
TGGGGTGCACATGAACTTTGTGTGTGCATGTGTGTGTGTGTGCATGTGTGTGTCCCGGGC 
ACCTGACACCCCCAGCCCAGGGCTGCCCAAAGTTGGG CTGATC AGAGACATAGACCCAAT 
GAGGAGCCCAACAGTGGCCCTCCAACCCTCTGCCTTGCCCCCATAGTTCATGCCCCAGTG 
GTCTTTGAAACTGCCCTGTGCCACTCCCTGGAGTGAGCAGCGGTGTCTCTGTGT GTGTGT 
GTGTCTGTG ~ 



In a search of public sequence databases, the NOV1 lb nucleic acid sequence, located 
on chromsome 19 has 5815 of 5817 bases (99%) identical to a gb:GENBANK- 
ID:AB033109|acc:AB033 109.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
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KIAA1283 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV1 lb polypeptide (SEQ ID NO:28) encoded by SEQ ID NO:27 has 
1885 amino acid residues and is presented in Table D using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV1 lb has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
cleavage site for a NOV1 lb peptide is between amino acids 25 and 26. 



Table 11D. Encoded NOVllb protein sequence (SEQ ID NO:28). 

MSGALLWPLLPLLLLLLSARDGVRAAQPQAPGYLIAAPSVFRAGVEEVISVTIFNSPREV 
TVQAQLVAQGEPWQSQGAILDKGTIKLKVPTGLRGQALLKVWGRGWQAEEGPLFHNQTS 
VTVDGRGAS VF I QTDKPVYRPQHRVL I S I FTVS PNLRPVNEKLEAY I LDPRGS RM I EWRH 
LKPFCCGITNMSFPLSDQPVLGEWFIFVEMQGHAYNKSFEVQKYVLPKFELLIDPPRYIQ 
DLDACETGTVRARYTFGKPVAGALMINMTVNGVGYYSHEVGRPVLRTTKILGSRDFDICV 
RDMIPADVPEHFRGRVSIWAMVTSVDGSQQVAFDDSTPVQRQLVDIRYSKDTRKQFKPGL 
AYVGKVELS YPDGS PAEGVTVQI KAELTPKDNI YTSEWS QRGLVGFE I PS I PTSAQHVW 
LETKVMALNGKPVGAQYLPSYLSLGSWYSPSQCYLQLQPPSHPLQVGEEAYFSVKSTCPC 
NFTLYYEVAARGNIVLSGQQPAHTTQQRSKRAAPALEKPIRLTHLSETEPPPAPEAEVDV 
CVTSLHLAVTPSMVPLGRLLVFYVRENGEGVADSLQFAVETFFENQVSVTYSANETQPGE 
WDLRIRAARGSCVCVAAVDKSVYLLRSGFRLTPAQVFQELEDYDVSDSFGVSREDGPFW 
WAGLTAQRRRR S S VF P W P WG I TKD S G FAFTE TGL WMTDR VS LNHRQDGGL YTDE AVP AF 
QPHTGSLVAVAPSRHPPRTEKRKRTFFPETWIWHCLNISDPSGEGTLSVKVPDSITSWVG 
EAVALSTSQGLGIAEPSLLKTFKPFFVDFMLPALIIRGEQVKIPLSVYNYMGTCAEVYMK 
LSVPKGIQFVGHPGKRHVTKKMCVAPGEAEPIWVVLSFSDLGLNNITAKALAYGDTNCCR 
DGRSSKHPEENHADRRVPIGVDHVRRSVMVEAEGVPRAYTYSAFFCPSERVHISTPNKYE 
FQYVQRPLRLTRFDVAVRAHNDARVALSSGPQDTAGMIEIVLGGHQNTRSWISTSKMGEP 
VASAHTAKILSWDEFRTFWISWRGGLIQVGHGPEPSNESVIVAWTLPRPPEVQFIGFSTG 
WGSMGEFRI WRKMEVDES YSEAFTLGVPHGAI PGSERATAS I IGDVMGPTLNHLNNLLRL 
PFGCGEQNMIHFAPNVFVLKYLQKTQQLSPEVERETTDYLVQGYQRQLTYKRQDGSYSAF 
GERDASGSMWLTAFVLKSFAQARSFIFVDPRELAAAKSWIIQQQQADGSFLAVGRVLNKD 
I QGG I HG I VPLTAYWVALLE TGTASEEERGS TDKARHFLE S AAPLAMDPYS CALTTYAL 
TLLRSPAAPEALRKLRSLAIMRDGVTHWSLSNSWDVDKGTFLSFSDRVSQSWSAEVEMT 
AYALLT YT LLGDVAAAL PWKWL S QQRNALGG F S S T QDT C VALQ ALAE YA I L S Y AGG I NL 
TVSLASTNLDYQETFELHRTNQKVLQTAAIPSLPTGLFVSAKGDGCCLMQIDVTYNVPDP 
VAKPAFQLLVSLQEPEAQGRPPPMPASAAEGSRGDWPPADDDDPAADQHHQEYKVMLEVC 
TR WLHAG S SNMAVLE VPLLS GFRAD I ES LEQLLLDKHMGMKR YE VAGRRVLF YFDE I PS R 
CLTCVRFRALRECWGRTSALPVSVYDYYEPAFEATRFYNVSTHSPLARELCAGPACNEV 
ERAPARGPGWFPGESGPAVAPEEGAAIARCGCDHDCGAQGNPVCGSDGWYASACRLREA 
ACRQAAPLEPAPPSCCALEQRLPASSSSTYGDDLASVAPGPLQQDVKLNGAGLEVEDSDP 
EPEGEAEDR VTAGPRPPVS SGNLES S TQSAS PFHR WGQTPAPQRHSGR WGAHRPGLLS P 
VFVYS PAFQSGGEEGLWMSNTCTLR 



A search of sequence databases reveals that the NOV1 lb amino acid sequence has 
1882 of 1884 amino acid residues (99%) identical to, and 1883 of 1884 amino acid residues 
(99%) similar to, the 1884 amino acid residue ptnr : SPTREMBL- ACC :Q9ULD7 protein from 
Homo sapiens (Human) (KIAA1283 PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV1 lb is expressed in at least Adrenal Gland/Suprarenal gland, Bone Marrow, 

Brain, Heart, Kidney, Lung, Lymphoid tissue, Mammary gland/Breast, Pituitary Gland, 

Placenta, Prostate, Retina, Salivary Glands, Spleen, Thalamus, Thyroid. This information was 
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derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 



NOV 11c 

A disclosed NOV1 lc nucleic acid of 6157 nucleotides (also referred to as 
CG57488 03) encoding a alpha-macroglobulin-like protein is shown in Table 1 IE. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
underlined, and the start and stop codons are in bold letters. 



Table HE. NOV11C nucleotide sequence (SEQ ID NO:29). 

GTGAGTAAGTGAGGGGACGATCCCCGGAAGGGATCGGGGCGGGTCGGGGTCCGGAG ATGG 
GCGGAGCAGGCGTCCCGGGAGGGTGCGCCCAGGAGCGGGGCGAGCGGGGCGAGCGGGGCG 
GTCCCGGAGACGAGGCGGGTCCGGGGAGGGGGCTGGCCCGGGGCTGCCCCAGCTTGGCCG 
GGCGCGGAGCGGGGCGCATGGCGCCGGGCGCACTGCGCGGGGGCTGCGAACAAAGGGCCC 
CCGGCGGCGGCGCGAGGACGGCCGCGCTCGGACCCTGGCCCTGGCCCAGCCCTGGCCCGG 
CCCCCTCCCCAGGCGCGGCGCCCCCCAGGAGCCGAAAAATGAGCGGCGCCCTGCTCTGGC 
CGTTGCTCCCGCTCCTGCTCCTGCTGCTGTCGGCGCGGGACGGCGTGCGCGCCGCGCAGC 
CTCAGGCCCCGGGTTACTTGATTGCAGCTCCCTCTGTTTTTCGCGCGGGCGTGGAGGAAG 
TCATCAGCGTGACCATCTTTAACTCTCCAAGGGAAGTCACGGTCCAGGCTCAGCTGGTGG 
CCCAGGGTGAGCCGGTGGTGCAGAGCCAGGGAGCCATCCTGGATAAAGGGACAATCAAAC 
TCAAGGTGCCCACGGGCCTCCGGGGCCAAGCGCTTCTGAAAGTGTGGGGCCGCGGCTGGC 
AGGCGGAGGAGGGGCCCCTCTTTCACAACCAGACCTCGGTGACCGTGGACGGCCGGGGCG 
CTTCTGTATTCATCCAGACGGACAAGCCTGTGTACAGACCCCAGCACCGAGTGCTCATAA 
GCATCTTCACCGTCTCTCCAAATCTGAGGCCTGTCAACGAGAAGCTGGAAGCCTACATCC 
TGGACCCCCGAGGCTCTCGGATGATAGAGTGGAGACACTTAAAGCCGTTCTGCTGCGGCA 
TCACCAACATGAGCTTCCCCTTGTCCGACCAGCCTGTGTTGGGAGAATGGTTCATTTTTG 
TTGAAATGCAAGGCCACGCGTACAACAAGTCTTTTGAAGTTCAGAAGTATGTGTTGCCCA 
AGTTTGAGCTTCTGATTGACCCGCCCCGGTATATCCAAGACCTGGACGCCTGTGAGACAG 
GCACTGTGCGGGCCAGGTATACCTTTGGGAAACCTGTGGCTGGTGCCTTAACGATCAACA 
TGACTGTTAATGGTGTAGGGTACTACAGCCACGAGGTGGGACGCCCTGTCCTCAGAACAA 
CCAAGATCCTCGGCTCCCAGGACTTCGACATCTGCGTGAGGGACATGATCCCAGCGGACG 
TCCCTGAGCACTTCCGGGGCAGGGTCAGCATCTGGGCCATGGTGACCAGTGTGGACGGGA 
GCCAGCAGGTCGCGTTCGATGACTCCACCCCCGTGCAGAGGCAGCTGGTGGACATCCGGT 
ACTCCAAGGACACGAGGAAGCAGTTCAAGCCGGGCCTGGCCTACGTGGGGAAGGTGGAGC 
TATCCTACCCCGATGGCAGCCCAGCTGAGGGGGTGACGGTCCAGATTAAGGCAGAGCTGA 
CACCAAAGGATAACATCTACACCAGTGAAGTTGTGTCCCAGCGTGGACTAGTGGGGTTTG 
AAATCCCCTCCATCCCCACGTCAGCCCAGCACGTGTGGCTGGAGACCAAGGTGATGGCAC 
TGAACGGGAAGCCCGTGGGGGCTCAGTACCTGCCCAGCTACCTCTCCCTCGGCAGCTGGT 
ACTCCCCCAGCCAGTGCTACCTGCAGCTGCAGCCACCCTCCCACCCACTGCAGGTTGGGG 
AAGAAGCCTATTTTTCTGTGAAGTCCACATGTCCCTGCAACTTTACCCTGTACTACGAGG 
TGGCTGCACGGGGCAATATTGTGCTATCGGGCCAGCAGCCTGCCCACACCACCCAGCAGC 
GAAGCAAGCGGGCGGCCCCTGCCCTGGAGAAACCGATTCGTTTAACACACCTTTCTGAGA 
CAGAGCCCCCACCAGCCCCAGAAGCTGAGGTCGACGTGTGTGTGACCTCTCTTCATCTGG 
CCGTGACCCCCAGCATGGTCCCCCTTGGTCGCCTGCTGGTCTTCTACGTCAGGGAGAATG 
GAGAAGGGGTCGCCGACAGCCTTCAGTTTGCAGTCGAGACCTTCTTCGAAAACCAGGTTT 
CAGTGACGTATTCAGCAAATGAGACCCAACCTGGGGAGGTTGTCGACCTGCGGATCAGGG 
CTGCAAGGGGCAGCTGTGTGTGCGTCGCCGCAGTTGATAAGAGTGTCTACCTGCTCAGGT 
CTGGGTTCCGGCTGACTCCTGCCCAGGTTTTCCAGGAACTGGAAGATTATGATGTTTCTG 
ATTCCTTTGGCGTGTCCAGGGAGGATGGTCCTTTTTGGTGGGCTGGGCTGACGGCACAAC 
GACGCCGGCGCTCCTCTGTCTTCCCGTGGCCTTGGGGCATCACCAAGGACTCTGGGTTTG 
CCTTCACCGAAACGGGACTGGTGGTGATGACCGACCGAGTGAGCCTGAACCACCGGCAGG 
ACGGTGGCCTCTACACCGATGAGGCTGTCCCCGCTTTCCAGCCCCACACAGGGAGCCTGG 
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TGGCAGTGGCTCCTTCCAGGCACCCCCCCAGAACAGAGAAGAGAAAAAGGACTTTCTTCC 
CCGAAACATGGATTTGGCATTGTCTCAACATCAGTGACCCATCTGGTGAGGGGACACTCA 
GTGTGAAGGTCCCGGACTCCATCACCAGCTGGGTGGGTGAGGCCGTGGCCCTGTCCACCT 
CTCAGGGCTTAGGCATCGCCGAGCCCTCCCTGCTGAAGACCTTCAAGCCCTTCTTCGTGG 
ACTTCATGCTCCCCGCTCTCATCATCCGTGGGGAGCAGGTCAAGATCCCGCTCAGTGTCT 
ACAACTACATGGGCACCTGCGCTGAGGTGTACATGAAGCTCTCGGTTCCCAAGGGCATCC 
AGTTTGTTGGGCATCCTGGCAAACGCCATGTGACCAAGAAGATGTGTGTGGCCCCCGGGG 
AGGCTGAGCCCATCTGGGTCGTTCTGTCCTTCAGCGACCTGGGACTCAACAACATCACGG 
CCAAAGCCCTTGCTTACGGAGACACAAATTGCTGCCGGGATGGGAGGTCCAGCAAACACC 
CTGAGGAGAATCACGCCGACAGGAGGGTCCCCATCGGGGTGGATCACGTCAGGCGCAGTG 
TGATGGTTGAGGCGGAAGGAGTCCCCCGGGCGTACACCTACAGCGCATTCTTCTGTCCCA 
GTGAGAGAGTCCACATCTCCACCCCCAACAAGTATGAGTTCCAGTATGTGCAGCGGCCAC 
TGCGCCTCACCCGCTTTGATGTGGCTGTGCGAGCTCACAATGATGCCCGTGTGGCCTTGT 
CTTCTGGGCCCCAGGACACAGCAGGCATGATCGAGATCGTCCTGGGGGGGCATCAGAACA 
CCAGGTCATGGATCTCCACCAGCAAGATGGGAGAGCCCGTGGCCAGTGCACACACGGCCA 
AGATCCTCTCCTGGGATGAATTCAGAACATTCTGGATCAGCTGGCGTGGTGGCCTTATCC 
AGGTTGGCCATGGTCCAGAGCCATCCAATGAGTCTGTCATTGTGGCCTGGACCCTCCCGA 
GGCCACCAGAGGTCCAGTTCATTGGCTTTTCCACCGGCTGGGGCTCCATGGGTGAATTCC 
GAATCTGGAGGAAGATGGAGGTGGACGAGAGCTACAGCGAGGCCTTCACCCTGGGGGTCC 
CACACGGCGCCATCCCTGGGTCTGAGCGAGCCACCGCCTCCATCATCGGGGACGTCATGG 
GGCCAACCCTGAACCACCTCAACAACCTCCTGCGGCTGCCGTTTGGCTGTGGAGAGCAGA 
ACATGATCCACTTTGCACCCAACGTCTTTGTCTTGAAGTATCTTCAGAAAACCCAGCAGC 
TCAGCCCTGAGGTGGAGAGAGAGACCACCGACTACCTAGTACAAGGCTACCAGCGCCAGC 
TGACCTACAAGCGCCAGGATGGCTCCTACAGCGCGTTTGGGGAGCGGGACGCATCGGGGA 
GCATGTGGCTCACAGCCTTTGTCCTGAAGTCCTTCGCACAGGCTCGCAGCTTTATCTTCG 
TGGACCCCCGGGAGCTGGCTGCCGCCAAGAGCTGGATCATCCAGCAGCAGCAGGCCGATG 
GCTCCTTCCTGGCCGTGGGCAGGGTCCTGAACAAGGACATCCAGGGTGGGATCCACGGCA 
TTGTCCCGCTGACAGCCTACGTGGTGGTTGCTCTCCTGGAAACAGGCACAGCCTCAGAGG 
AGGAGAGAGGCTCCACTGACAAAGCGAGGCACTTCCTGGAGTCTGCTGCGCCCCTGGCCA 
TGGACCCTTATAGCTGTGCCCTGACTACCTACGCGCTGACCCTGCTCCGCAGCCCGGCAG 
CCCCTGAGGCACTGCGCAAGCTCCGTAGCCTGGCCATCATGCGAGATGGGGTCACCCACT 
GGAGCCTGTCAAATTCCTGGGACGTGGACAAGGGCACATTCTTGAGCTTCAGTGACAGGG 
TCTCTCAGTCAGTGGTCTCGGCCGAGGTGGAAATGACAGCCTACGCCCTTCTGACCTACA 
CTCTGCTGGGTGACGTGGCTGCCGCCCTGCCTGTGGTGAAGTGGCTGTCCCAGCAGCGAA 
ATGCACTTGGGGGCTTCTCCTCCACTCAGGACACCTGCGTGGCTCTGCAGGCCTTGGCTG 
AATATGCCATCTTGTCCTATGCTGGAGGCATCAACCTCACTGTCTCCCTGGCCTCCACCA 
ACCTGGACTACCAGGAAACCTTCGAGCTGCACAGGACCAACCAGAAGGTTCTGCAGACAG 
CAGCGATCCCCAGCCTCCCCACGGGGCTGTTTGTGAGTGCCAAGGGGGACGGCTGCTGCC 
TGATGCAGATTGATGTCACCTACAATGTGCCTGACCCGGTGGCCAAGCCAGCTTTCCAGC 
TGCTCGTAAGCCTCCAGGAGCCTGAGGCCCAGGGACGCCCGCCCCCCATGCCTGCCTCCG 
CAGCTGAGGGTTCCCGAGGAGACTGGCCCCCAGCTGACGATGATGACCCAGCGGCCGATC 
AGCATCACCAGGAATACAAGGTGATGCTGGAGGTGTGCACCAGGTGGCTGCATGCAGGGT 
CTTCCAATATGGCTGTCCTGGAGGTGCCCCTGCTGTCAGGCTTCCGGGCAGACATCGAGA 
GCCTGGAGCAGCTGCTCCTTGACAAGCACATGGGGATGAAGAGGTATGAAGTGGCTGGAC 
GCCGAGTGCTCTTCTACTTTGATGAGATCCCCAGCCGGTGCCTGACGTGCGTGCGGTTCC 
GTGCTCTCCGGGAGTGCGTGGTGGGCAGGACGTCGGCGCTGCCAGTCTCCGTGTACGACT 
ACTACGAACCCGCCTTCGAGGCCACTCGCTTCTACAACGTCAGCACGCACAGCCCACTCG 
CCCGGGAACTGTGCGCCGGACCCGCGTGCAACGAAGTGGAGCGCGCCCCTGCCCGGGGCC 
CGGGCTGGTTCCCCGGCGAGTCGGGCCCTGCCGTGGCCCCTGAGGAGGGGGCGGCGATCG 
CGCGATGCGGCTGCGACCACGACTGCGGCGCCCAGGGGAACCCGGTGTGCGGCTCCGACG 
GGGTGGTCTACGCCAGCGCCTGCCGCCTGCGGGAGGCCGCCTGCCGCCAGGCCGCGCCCC 
TGGAGCCCGCGCCTCCCAGCTGCTGCGCCCTCGAGCAGCGGCTGCCGGCCTCGTCGTCCT 
CCACCTACGGGGATGACCTGGCTTCTGTGGCCCCGGGGCCTTTACAGCAGGACGTGAAGC 
TGAATGGAGCCGGCCTTGAGGTGGAGGACTCAGACCCTGAGCCTGAAGGGGAGGCGGAGG 
ACAGGGTCACAGCCGGGCCTCGGCCTCCTGTGAGCAGCGGGAACCTGGAAAGCAGCACCC 
AGAGCGCCAGCCCGTTCCACAGATGGGGCCAGACTCCGGCCCCTCAGAGACATAGTGGCC 
GGGTGGTGGGGGCCCACAGGCCAGGGCTTCTGAGCCCTGTCTTCGTCTACAGCCCAGCCT 
TTCAGAGTGGTGGGGAGGAGGGTTTATGGATGTCAAACACCTGCACCTTGAGATAA TCCT 
ACAACCACATGCAGTTGTGGGACCGCAGTTTGGTCCTGGGGACCATTCATACCCACACAC 
CCAGCTTGTGCCTGTGGTTAACATCTCAGAAAACTCTGGTAAATGATCACTCCAGGATAT 
TGACACGAATACACGTTACTGATCTTACTCACATGTT 
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In a search of public sequence databases, the NOV1 1C nucleic acid sequence, located 
on chromsome 19 has 332 of 513 (64%) identical to a GENBANK- 

ID:GPIMSPB|acc:D84339. 1 Cavia porcellus mRNA for murinoglobulin. Public nucleotide 
databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV1 1C polypeptide (SEQ ID NO:30) encoded by SEQ ID NO:29 has 
1979 amino acid residues and is presented in Table 1 IF using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV1 1C has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 11F. Encoded NOV11C protein sequence (SEQ ID NO:30). 

MGGAGVPGGCAQERGERGERGGPGDEAGPGRGLARGCPSLAGRGAGRMAPGALRGGCEQR 
APGGGARTAALGPWPWPSPGPAPSPGAAPPRSRKMSGALLWPLLPLLLLLLSARDGVRAA 
QPQAPGYLIAAPSVFRAGVEEVISVTIFNSPREVTVQAQLVAQGEPWQSQGAILDKGTI 
KLKVPTGLRGQALLKVWGRGWQAEEGPLFHNQTSVTVDGRGASVFIQTDKPVYRPQHRVL 
I S I FTVS PNLRPVNEKLEAYI LDPRGSRMI E WRHLKPFCCG I TNMS FPLS DQPVLGE WF I 
FVEMQGHAYNKSFEVQKWLPKFELLIDPPRYIQDLDACETGTVRARYTFGKPVAGALTI 
ISHVITVNGVGYYSHEVGRPVLRTTKILGSQDFDICVRDMIPADVPEHFRGRVSIWAMVTSVD 
GSQQVAFDDSTPVQRQLVDIRYSKDTRKQFKPGLAYVGKVELSYPDGSPAEGVTVQIKAE 
LTPKI)NIYTSEWSQRGLiVGFEIPSIPTSAQHWLETKWlALNGKPVGAQYLPSYLSLGS 
WYSPSQCYLQLQPPSHPLQVGEEAYFSVKSTCPCNFTLYYEVAARGNIVLSGQQPAHTTQ 
QRSKJUU^PALEKPIRLTHLSETEPPPAPEA£VDVCVTSLHIJ\VTPSMVPLGRLLVFYVRE 
NGEGVADSLQFAVETFFENQVSVTYSANETQPGEWDLRIRAARGSCVCVAAVDKSVYLL 
RSGFRLTPAQVFQELEDYDVSDSFGVSREDGPFWWAGLTAQRRRRSSVFPWPWGITKDSG 
FAFTETGLWMTDRVS LNHRQDGGLYTDEAVPAFQPHTGSLVAVAPSRHPPRTEKRKRTF 
FPETWIWHCLNISDPSGEGTLSVKVPDSITSWVGEAVALSTSQGLGIAEPSLLKTFKPFF 
VDFMLPALI IRGEQVKI PLS VYNYMGTCAEVYMKLS VPKG IQFVGHPGKRHVTKKMCVAP 
GEAEPIWWLSFSDLGLNNITAKALAYGDTNCCRDGRSSKHPEENHADRRVPIGVDHVRR 
SVMVEAEGVPRAYTYSAFFCPSERVHISTPNKYEFQYVQRPLRLTRFDVAVRAHNDARVA 
LSSGPQDTAGMIEIVLGGHQNTRSWISTSKMGEPVASAHTAKILSWDEFRTFWISWRGGL 
IQVGHGPEPSNESVIVAWTLPRPPEVQFIGFSTGWGSMGEFRIWRKMEVDESYSEAFTLG 
VPHGAI PGSERATAS I IGDVMGPTLNHLNNLLRLPFGCGEQNMIHFAPNVFVLKYLQKTQ 
QLSPEVERETTDYLVQGYQRQLTYKRQDGSYSAFGERDASGSMWLTAFVLKSFAQARSFI 
FVDPRELAAAKS WI IQQQQADGS FLAVGRVLNKDIQGGIHGIVPLTAYVWALLETGTAS 
EEERGSTDKARHFLESAAPLAMDPYSCALTTYALTLLRSPAAPEALRKLRSLAIMRDGVT 
HWSLSNSWDVDKGTFLSFSDRVSQSWSAEVEMTAYALLTYTLLGDVAAALPWKWLSQQ 
RNALGGFSSTQDTCVALQALAEYAILSYAGGINLTVSLASTNLDYQETFELHRTNQKVLQ 
TAAIPSLPTGLFVSAKGDGCCLMQIDVTYNVPDPVAKPAFQLLVSLQEPEAQGRPPPMPA 
SAAEGSRGDWPPADDDDPAADQHHQEYKVMLEVCTRWLHAGSSNMAVLEVPLLSGFRADI 
ESLEQLLLDKHMGMKJ^YEVAGRRVLFYFDEIPSRCLTCVRFRALRECVVGRTSALPVSVY 
DYYEPAFEATRFYNVSTHSPLARELCAGPACNEVERAPARGPGWFPGESGPAVAPEEGAA 
IARCGCDHDCGAQGNPVCGSDGWYASACRLREAACRQAAPLEPAPPSCCALEQRLPASS 
SSTYGDDLASVAPGPLQQDVKLNGAGLEVEDSDPEPEGEAEDRVTAGPRPPVSSGNLESS 
TQSASPFHRWGQTPAPQRHSGRWGAHRPGLLSPVFVYSPAFQSGGEEGLWMSNTCTLR 



A search of sequence databases reveals that the NOV1 1C amino acid sequence has 171 
of 432 amino acid residues (39%) identical to, and 258 of 432 amino acid residues (58%) 
similar to, the guinea pig protein ptnr:SPTREMBL-ACC:Q60486 ALPHA- 
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MACROGLOBULIN PRECURSOR - Cavia porcellus. Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 



The disclosed NOV1 la polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 1 1G. 



Table 11G. BLAST results for NOV11 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sit ives 
(%) 


Expect 


gi| 6331358|dbj | BAA8 


KIAA1283 protein 
[Homo sapiens] 


1884 


1882/1926 
(97%) 


1883/1926 
(97%) 


0.0 


6597. l| (AB033109 


gi| 15302736 | ref |XP 


KIAA1283 protein 
[Homo sapiens] 


1711 


1710/1711 
(99%) 


1710/1711 
(99%) 


0.0 


050563. 2| 
(XM_050563 


gi|l8567969|ref |XP 


Similar to 
KIAA1283 protein 
[Homo sapiens] 


837 


665/732 
(90%) 


667/732 
(90%) 


0.0 


095282. l| 
(XM 095282 


gi| 13 92 8544 | db j | BAB 


complement 
component C3 
[Branchiostoma 
belcheri] 


1732 


220/611 
(36%) 


313/611 
(51%) , 


4e-92 


47146.1) (AB050668 


gi | 17975514] ref |NP 


Thiolester 
containing 
protein II 
[Drosophila 
melanogaster] 


1420 


195/583 
(33%) 


312/583 
(53%) 


4e-85 


523506.1] 
(NM_078782) 



Tables 1 1H-I lists the domain descriptions from DOMAIN analysis results against 
NOV1 1 . This indicates that the NOV1 1 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 11H. Domain Analysis of NOV11 



gnl [ Pfam |pfam0 02Q7, A2M, Alpha-2 -macroglobulin family. This family 
includes the C-terminal region of the alpha -2 -macroglobulin family. 
CD-Length = 751 residues, Score = 330 bits (847), Expect = 3e-91 



Table 111. Domain Analysis of NOV11 

gnl | Sm art | smart00280 , KAZAL, Kazal type serine protease inhibitors ; 
Kazal type serine protease inhibitors and f ollistatin- like domains. 

CD-Length = 46 residues, Score = 52.0 bits (123), Expect = 3e-07 
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N0V1 1 is a member of the alpha-macroglobulin family. Alpha-macroglobulin proteins 
are large extracellular glycoproteins that can bind to and often act as reservoirs of growth 
factors and extracellular enzymes (See Gonias et al., J Biol Chem 2000 Feb 25;275(8):5826- 
31). Decreased level of these proteins in serum is often a sign of tissue damage (See Ruaux et 
5 al., Res Vet Sci 1999 Aug;67(l):83-7; Levine et al., J Pediatr Gastroenterol Nutr 1989 

Nov;9(4):5 17-20; Wiedermann et al., Neoplasma 1978;25(2): 189-96). These proteins may also 
help defend the body against bacterial or parasitic infection (See Araujo- Jorge et al., Parasitol 
Resl992;78(3):215-21). 

The disclosed NOV1 1 nucleic acid of the invention encoding an alpha-macroglobulin- 

10 like protein includes the nucleic acid whose sequence is provided in Table 1 1 A, 1 1C or 1 IE or 
a fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 1 1 A, 1 1C or 1 IE while 
still encoding a protein that maintains its alpha-macroglobulin-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 

1 5 whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

20 derivatized. These modifications are carried out at least in part to enhance the chemical 

stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 percent of the bases may be so changed. 

The disclosed NOV1 1 protein of the invention includes the alpha-macroglobulin-like 

25 protein whose sequence is provided in Table 1 IB, 1 ID or 1 IF. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 1 IB, 1 ID or 1 IF while still encoding a protein that maintains its 
alpha-macroglobulin-like activities and physiological functions, or a functional fragment 
thereof. In the mutant or variant protein, up to about 1 percent of the residues may be so 

30 changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this alpha- 
macroglobulin-like protein (NOV1 1) may function as a member of a "alpha-macroglobulin 
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family". Therefore, the NOV1 1 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 
5 (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV1 1 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

10 and disorders as indicated below. For example, a cDNA encoding the alpha-macroglobulin- 
like protein (NOV1 1) may be useful in gene therapy, and the alpha-macroglobulin-like protein 
(NOV1 1) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from cancer,trauma, regeneration (in vitro and in vivo), 

1 5 viral/bacterial/parasitic infections, adrenoleukodystrophy , congenital adrenal hyperplasia, 
hemophilia, hypercoagulation, idiopathic thrombocytopenic purpura, autoimmune disease, 
allergies, immunodeficiencies, transplantation, graft versus host disease, cardiomyopathy, 
atherosclerosis, hypertension, congenital heart defects, aortic stenosis, atrial septal defect 
(ASD), atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary stenosis, subaortic 

20 stenosis, ventricular septal defect (VSD), valve diseases, tuberous sclerosis, scleroderma, 
obesity, systemic lupus erythematosus, autoimmune disease, asthma, emphysema, 
scleroderma, allergy, ARDS, renal artery stenosis, interstitial nephritis, glomerulonephritis, 
polycystic kidney disease, systemic lupus erythematosus, renal tubular acidosis, IgA 
nephropathy, hypercalceimia, Lesch-Nyhan syndrome, Von Hippel-Lindau (VHL) syndrome, 

25 diabetes, tuberous sclerosis, xerostomia , fertility, endocrine dysfunctions, growth and 

reproductive disorders, or other pathologies or conditions. The NOV1 1 nucleic acid encoding 
the alpha-macroglobulin-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

30 NOV1 1 nucleic acids and polypeptides are further useful in the generation of 

antibodies that bind immuno-specifically to the novel NOV1 1 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV1 1 proteins have multiple hydrophilic 
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regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV12 

NOV 12 includes two orphan transporter-like proteins disclosed below. The disclosed 
sequences have been named NOV 12a and NOV 12b. 
NOV12a 

A disclosed NOV12a nucleic acid of 21 19 nucleotides (also referred to as CG57526- 
01) encoding an orphan receptor-like protein is shown in Table 12 A. Putative untranslated 
regions upstream and/or downstream from the coding region, if any, are underlined, and the 
start and stop codons are in bold letters. 



Table 12A. NOV12a nucleotide sequence (SEQ ID NO:31). 



TGTGGTTT CCAAACGTCGGCAGAGGCTGGAGACGGCTCTCTAGTGCTGGGTGTGGAGTGA 
GGCACCACCCTCGCCCTGAAGCCTGGGGCACT CAGTCACCATGGCTCATGCCCCAGAACC 
AG ACCCGGCCGCCAGCGACCTCGGGGATGAGAGG CCCAAGTGGGACAACAAGGCC CAGTA 
CCTCCTGAGCTGCATCGGGTTTGCCGTGGGGCTGGGGAACATTTGGCGGTTCCCATACCT 
GTGCCAGACCTATGGAGGAGGTGCCTTCCTCATCCCCTACGTCATCGCGCTGGTCTTCGA 
GGGGATCCCCATTTTCCACGTCGAGCTCGCCATCGGCCAGCGGCTGCGGAAGGGCAGCGT 
CGGCGTGTGGACGGCCATCTCCCCGTACCTCAGTGGAGTAGGTCTGGGCTGTGTCACGCT 
GTCCTTCCTGATCAGCCTGTACTACAACACCATCGTGGCGTGGGTGCTGTGGTACCTCCT 
CAACTCCTTCCAGCACCCGCTGCCCTGGAGCTCCTGCCCACCGGACCTCAACAGAACAGG 
TTTTGTGGAGGAGTGCCAGGGCAGCAGCGCCGTGAGCTACTTCTGGTACCGGCAGACACT 
GAACATCACAGCCGACATCAATGACAGTGGCTCCATCCAGTGGTGGCTGCTCATCTGCTT 
GGCAGCCTCCTGGGCAGTCGTGTACATGTGTGTCATCAGGGGCATTGAGACTACAGGGAA 
GGTGATTTACTTCACAGCTTTGTTCCCTTACCTGGTCCTGACCATCTTTCTCATCAGAGG 
GCTGACCCTGCCAGGGGCAACAAAAGGACTCATCTACTTGTTCACTCCCAACATGCACAT 
TCTCCAGAACCCCCGGGTGTGGCTGGACGCAGCCACCCAGATATTCTTCTCTCTGTCCCT 
GGCCTTCGGAGGACACATCGCTTTTGCAAGTTACAACTCGCCCAGGAGGAATGACTGCCA 
GAAGGATGCGGTGGTCATCGCCCTGGTCAACAGGATGACCTCCCTGTACGCGTCCATCGC 
TGTCTTCTCTGTCCTGGGGTTCAAAGCAACTAATGACCAGGAGCACTGCCTGGACAGGAA 
CATCCTCAGCCTCATCAACGACTTTGACTTCCCAGAGCAGAGCATCTCCAGGGACGACTA 
CCCAGCCGTCCTCATGCACCTGAACGCCACCTGGCCCAAGAGGGTGGCCCAGCTCCCCCT 
GAAGGCCTGCCTCCTGGAAGACTTTCTGGATAAGAGTGCCTCGGGCCCGGGCCTGGCCTT 
CGTCGTCTTCACGGAGACCGACCTCCACATGCCGGGGGCTCCTGTGTGGGCCATGCTCTT 
CTTCGGGATGCTGTTCACCTTGGGGCTATCGACCATGTTCGGGACCGTGGAGGCGGTCAT 
CACACCCCTGCTGGACGTGGGGGTCCTGCCTAGATGGGTCCCCAAGGAGGCCCTGACTGG 
TCCAGGGCTGGTCTGCCTGGTCTGCTTCCTCTCCGCCACCTGCTTCACGCTGCAGTCTGG 
GAACTACTGGCTGGAGATTTTCGACAATTTTGCCGCTTCCCTGAACCTGCTCATGTTGGC 
CTTTCTCGAGGTTGTGGGTGTCGTTTATGTTTATGGAATGAAACGGTTCTGCGATGACAT 
TGCGTGGATGACCGGGAGGCGGCCCAGCCCCTACTGGCGGCTGACCTGGAGGGTGGTCAG 
TCCCCTGCTGCTGACCATCTTTGTGGCTTACATCATCCTCCTGTTCTGGAAGCCACTGAG 
ATACAAGGCCTGGAACCCCAAATACGAGCTGTTCCCCTCGCGTCAGGAGAAGCTCTACCC 
GGGCTGGGCGCGCGCCGCCTGTGTGCTGCTGTCCTTGCTGCCCGTGCTGTGGGTCCCGGT 
GGCCGCGCTTGCTCAGCTGCTCACCCGGCGGAGGCGGACGTGGAGGGACAGGGACGCGCG 
CCCAGACACGGACATGCGCCCGGACACGGACACGCGCCCAGACACGGACATGCGCCCGGA 
CACGGACATGCGCTG AAGCCGGCCGGAGCGGGGCCTGCATGGGCGGGTCTGTGGGGGGGC 
TTGGCCTGATGGTGGGCGGGGCCCCGCCCACAGGGCCGACCCCAATACACCAGCGACTCA 
ACCTTAAAAAAAAAAAAAA 
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In a search of public sequence databases, the NOV 12a nucleic acid sequence, located 
on chromsome 5 has 1 122 of 1396 bases (80%) identical to a gb:GENBANK- 
ID:AF075263|acc:AF075263.1 mRNA from Mus musculus (Mus musculus orphan transporter 
isoform Al 1 (Xtrp2) mRNA, alternatively spliced, complete cds. Public nucleotide databases 
5 include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV 12a polypeptide (SEQ ID NO:32) encoded by SEQ ID NO:31 has 
63 1 amino acid residues and is presented in Table 12B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 12a has a signal peptide and is 
likely to be localized to the plasma membrane with a certainty of 0.8000. 

Table 12B. Encoded NOV12a protein sequence (SEQ ID NO:32). 

MAHAPEPDPAASDLGDERPKWDNKAQYLLSCIGFAVGLGNIWRFPYLCQTYGGGAFLIPY 
VI ALVFEG I P I FHVELAIGQRLRKGS VGVWTAI S PYLSGVGLGCVTLS FL I S L YYNT I VA 
WVLWYLLNSFQHPLPWSSCPPDLNRTGFVEECQGSSAVSYFWYRQTLNITADINDSGSIQ 
WWLL I CLAAS WAVVYMCVI RG I ETTGKVI Y FTAL FP YLVLT I FL I RGLTLPG ATKG L I YL> 
FTPNMH I LQNPRVWLDAATQ I FFSLS LAFGGH I AFAS YNS PRRNDCQKDAWI ALVNRMT 
SLYAS IAVFSVLGFKATNDQEHCLDRNILSLINDFDFPEQS ISRDDYPAVLMHLNATWPK 
RVAQLPLKACLLEDFLDKSASGPGLAFWFTETDLHMPGAPVWAMLFFGMLFTLGLSTMF 
GTVEAVITPLLDVGVLPRWPKEALTGPGLVCLVCFLSATCFTLQSGNYWLEIFDNFAAS 
LNLLMLAFLEWGWYVYGMKRFCDD I AWMTGRRPS PYWRLTWRWS PLLLTI FVAYI I L 
LFWKPLRYKAWNPKYELFPSRQEKLYPGWARAACVLLSLLPVLWVPVAALAQLLTRRRRT 
WRDRDARPDTDMRPDTDTRPDTDMRPDTDMR 

io " ~ " 

A search of sequence databases reveals that the NOV 12a amino acid sequence has 460 
of 602 amino acid residues (76%) identical to, and 528 of 602 amino acid residues (87%) 
similar to, the 615 amino acid residue ptnr:SPTREMBL-ACC:088576 protein from Mus 
musculus (Mouse) (ORPHAN TRANSPORTER ISOFORM A12). Public amino acid 
15 databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV 12a is expressed in at least Colon and Kidney. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 
20 NOV12b 

A disclosed NOV12b nucleic acid of 2039 nucleotides (also referred to as CG57526- 
02) encoding an orphan receptor-like protein is shown in Table 12C. Putative untranslated 
regions upstream and/or downstream from the coding region, if any, are underlined, and the 
start and stop codons are in bold letters. 

25 



91 



<ar% « *s* as- ma «s* « isat *m tr& ^a* 
7U.ii 1 w «^ «v&. ~3 p iL^n io .ruji rsw il^ ^^j' *w .a^ 



Table 12C. NOV12b nucleotide sequence (SEQ ID NO:33). 



TGTGGTTTCCAAACGTCGGCAGAGGCTGGAGACGGCTCTCTAGTGCTGGGTGTGGAGTGA 
GGCACCACCCTCGCCCTGAAGCCTGGGGCACTCAGTCACC ATGGCTCATGCCCCAGAACC 
AGACCCGGCCGCCAGCGACCTCGGGGATGAGAGGCCCAAGTGGGACAACAAGGCCCAGTA 
CCTCCTGAGCTGCATCGGGTTTGCCGTGGGGCTGGGGAACATTTGGCGGTTCCCATACCT 
GTGCCAGACCTATGGAGGAGGTGCCTTCCTCATCCCCTACGTCATCGCGCTGGTCTTCGA 
GGGGATCCCCATTTTCCACGTCGAGCTCGCCATCGGCCAGCGGCTGCGGAAGGGCAGCGT 
CGGCGTGTGGACGGCCATCTCCCCGTACCTCAGTGGAGTAGGTCTGGGCTGTGTCACGCT 
GTCCTTCCTGATCAGCCTGTACTACAACACCATCGTGGCGTGGGTGCTGTGGTACCTCCT 
CAACTCCTTCCAGCACCCGCTGCCCTGGAGCTCCTGCCCACCGGACCTCAACAGAACAGG 
TTTTGTGGAGGAGTGCCAGGGCAGCAGCGCCGTGAGCTACTTCTGGTACCGGCAGACACT 
GAA CATCACAGC CGACAT CAATGACAGTGGCT CCAT C CAGTGGTGGCTG CTGAT CTG CTT 
GGCAGCCTCCTGGGCAGTCGTGTACATGTGTGTCATCAGGGGCATTGAGACTACAGGGAA 
GGTGATTTACTTCACAGCTTTGTTCCCTTACCTGGTCCTGACCATCTTTCTCATCAGAGG 
GCTGACCCTGCCAGGGGCAACAAAAGGACTCATCTACTTGTTCACTCCCAACATGCACAT 
TCTCCAGAACCCCCGGGTGTGGCTGGACGCAGCCACCCAGATATTCTTCTCTCTGTCCCT 
GGCCTTCGGAGGACACATCGCTTTTGCAAGTTACAACTCGCCCAGGAGGAATGACTGCCA 
GAAGGATGCGGTGGTCATCGCCCTGGTCAACAGGATGACCTCCCTGTACGCGTCCATCGC 
TGTCTTCTCTGTCCTGGGGTTCAAAGCAACTAATGACCAGGAGCACTGCCTGGACAGGAA 
CATCCTCAGCCTCATCAACGACTTTGACTTCCCAGAGCAGAGCATCTCCAGGGACGACTA 
CCCAGCCGTCCTCATGCACCTGAACGCCACCTGGCCCAAGAGGGTGGCCCAGCTCCCCCT 
GAAGGCCTGCCTCCTGGAAGACTTTCTGGATAAGAGTGCCTCGGGCCCGGGCCTGGCCTT 
CGTCGTCTTCACGGAGACCGACCTCCACATGCCGGGGGCTCCTGTGTGGGCCATGCTCTT 
CTTCGGGATGCTGTTCACCTTGGGGCTATCGACCATGTTCGGGACCGTGGAGGCGGTCAT 
CACACCCCTGCTGGACGTGGGGGTCCTGCCTAGATGGGTCCCCAAGGAGGCCCTGACTGG 
TCCAGGGCTGGTCTGCCTGGTCTGCTTCCTCTCCGCCACCTGCTTCACGCTGCAGTCTGG 
GAACTACTGGCTGGAGATTTTCGACAATTTTGCCGCTTCCCTGAACCTGCTCATGTTGGC 
CTTTCTCGAGGTTGTGGGTGTCGTTTATGTTTATGGAATGAAACGGTTCTGCGATGACAT 
TGCGTGGATGACCGGGAGGCGGCCCAGCCCCTACTGGCGGCTGACCTGGAGGGTGGTCAG 
TCCCCTGCTGCTGACCATCTTTGTGGCTTACATCATCCTCCTGTTCTGGAAGCCACTGAG 
ATACAAGGCCTGGAACCCCCAGGAGCTGTTCCCCTCGCGTCAGGAGAAGCTCTACCCGGG 
CTGGGCGCGCGCCGCCTGTGTGCTGCTGTCCTTGCTGCCCGTGCTGTGGGTCCCGGTGGC 
CGCGCTTGCTCAGCTGCTCACCCGGCGGAGGCGGACGTGGAGACAGGCGCATGCTGAGGC 
CGGGCTGGTGTTCCAGGACTTCGAGAAGCAGAGGCCTGGCGTGGGGATACAGTACCTGAT 
TCCAATGCTTTGCAACTTGCTCCAGACACTCTTCCGGTAGAAAAAGAGCCTGTTCTTTT 



In a search of public sequence databases, the NO V 12b nucleic acid sequence, located 
on chromsome 5 has 1122 of 1396 bases (80%) identical to a gb.GENBANK- 
ID:AF075263|acc:AF075263.1 mRNA from Mus musculus (Mus musculus orphan transporter 
isoform Al 1 (Xtrp2) mRNA, alternatively spliced, complete cds). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV12b polypeptide (SEQ ID NO:34) encoded by SEQ ID NO:33 has 
639 amino acid residues and is presented in Table 12D using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 12b has a signal peptide and is 
likely to be localized to the plasma membrane with a certainty of 0.8000. 



Table 12D. Encoded NOV12b protein sequence (SEQ ID NO:34). 



MAHAPEPDPAASDLGDERPKWDNKAQYLLSCIGFAVGLGNIWRFPYLCQTYGGGAFLIPY 
V I ALVFEG I P I FHVELAI GQRLRKGS VGWJTAI S P YLSGVGLG CVTLS FL I S L Y YNT I VA 
WVLWYLLNSFQHPLPWSSCPPDLNRTGFVEECQGSSAVSYFWYRQTLNITADINDSGSIQ 
WWLLICLAASWAVVYMCVIRGIETTGKVIYFTALFPYLVLTIFLIRGLTLPGATKGLIYL 
FTPNMHILQNPRVWLDAATQIFFSLSLAFGGHIAFASYNSPRRNDCQKDAWIALVNRMT 
SLYAS IAVFSVLGFKATNDQEHCLDRNILSLINDFDFPEQSISRDDYPAVLMHLNATWPK 
RVAQLPLKACLLEDFLDKSASGPGLAFWFTETDLHMPGAPVWAMLFFGMLFTLGLSTMF 
GTVEAVI TPLLDVGVLPRWVPKEALTGPGLVCLVCFLSATCFTLQSGNYWLE I FDNFAAS 
LNLLMLAFLEWGWYVYGMKRFCDDIAWMTGRRPSPYWRLTWRWSPLLLTIFVAYIIL 
LFWKPLRYKAWNPQELFPSRQEKLYPGWARAACVLLSLLPVLWVPVAALAQLLTRRRRTW 
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A search of sequence databases reveals that the NOV 12b amino acid sequence has 465 
of 613 amino acid residues (75%) identical to, and 534 of 613 amino acid residues (87%) 
similar to, the 615 amino acid residue ptnr:SPTREMBL-ACC:088576 protein from Mus 
musculus (Mouse) (ORPHAN TRANSPORTER ISOFORM A12)(. Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV 12b is expressed in at least Colon and Kidney. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV 12 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 12E. 



Table 12E. BLAST results for NOV12 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi| 165506l9|dbj | BAB 


unnamed protein 
product [Homo 
sapiens] 


628 


626/631 
(99%) 


626/631 
(99%) 


0.0 


71018. l| (AK055798 


qi|3347922|qb|AAC27 


orphan 
transporter 
isoform A12 [Mus 
musculus] 


615 


460/603 
(76%) 


528/603 
(87%) 


0.0 


757.1 | (AF075262) 


gi I 8394204 | ref | NP 0 


X transporter 
protein 2 [Rattus 
norvegicus] 


615 


461/604 
(76%) 


528/604 
(87%) 


0.0 


58859. l| 


(NM 017163) 


gi|3347924|gb|AAC27 


orphan 
transporter 
isoform All [Mus 
musculus] 


605 


454/603 
(75%) 


520/603 
(85%) 


0.0 


758. 1| (AF075263) 


gi|3347926 |gb|AAC27 


orphan 
transporter 
isoform Bll [Mus 
musculus] 


577 


416/603 
(68%) 


478/603 
(78%) 


0.0 


759.1 | {AF075264) 
orphan 



Table 12G lists the domain descriptions from DOMAIN analysis results against 
NOV 12. This indicates that the NOV 12 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 12G. Domain Analysis of NOV12 

gnl | Pf am | pf am00209, SNF, Sodium: neurotransmitter symporter family. 

CD-Length = 534 residues, 92.5% aligned 

Score = 469 bits (1207), Expect = 2e-133 



A gene family encoding many Na(+)- and Cl(-)-dependent organic solute 
cotransporters has recently been recognized. Among the cotransporters that have been 
characterized are those for neurotransmitters, amino acids, and organic osmolytes. The cDNA 
is 2,354 bp long with an open reading frame of 1,845 bp. The 615 deduced amino sequence 
shows ROSIT to be most clearly related to two orphan cDNAs of this family isolated from 
brain. Northern analysis showed the mRNA is normally expressed in renal cortex but not in 
brain, heart, colon, liver, stomach, or skeletal muscle. Moreover, hypernatremic rats displayed 
a marked increase in mRNA levels in renal cortex, renal outer medulla, and perhaps intestine. 
Heterologous expression of the cRNA in Xenopus laevis oocytes failed to reveal the function 
of this gene product when analyzed with isotope fluxes or electrophysiological measurements 
using a wide variety of organic solutes. ROSIT is likely to be involved in kidney reclamation 
of an organic osmolyte or osmolyte precursor required for adaptation to hypertonic stress. (See 
Wasserman et aL, Am J Physiol 1994 Oct;267(4 Pt 2):F688-94). 

The disclosed NOV 12 nucleic acid of the invention encoding a orphan receptor-like 
protein includes the nucleic acid whose sequence is provided in Table 12A or 12C or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 12A or 12C while still 
encoding a protein that maintains its orphan receptor-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 20 percent of the bases may be so changed. 
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The disclosed NOV 12 protein of the invention includes the orphan receptor-like 
protein whose sequence is provided in Table 12B or 12D. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 12B or 12D while still encoding a protein that maintains its orphan 
5 receptor-like activities and physiological functions, or a functional fragment thereof. In the 
mutant or variant protein, up to about 24 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2> that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this orphan receptor-like 

10 protein (NOV 12) may function as a member of a "orphan receptor family". Therefore, the 
NOV 12 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 

1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV 12 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

20 and disorders as indicated below. For example, a cDNA encoding the orphan receptor-like 
protein (NOV 12) may be useful in gene therapy, and the orphan receptor-like protein 
(NOV 12) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from cancer, trauma, regeneration (in vitro and in vivo), 

25 viral/bacterial/parasitic infections, Hirschsprung's disease , Crohn's Disease, appendicitis, 

diabetes, autoimmune disease, renal artery stenosis, interstitial nephritis, glomerulonephritis, 
polycystic kidney disease, systemic lupus erythematosus, renal tubular acidosis, IgA 
nephropathy, hypercalceimia, Lesch-Nyhan syndrome, or other pathologies or conditions. The 
NOV 12 nucleic acid encoding the orphan receptor-like protein of the invention, or fragments 

30 thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV 12 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 12 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
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known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 12 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV13 

A disclosed NOV13 nucleic acid of 1748 nucleotides (also referred to as CG-57570- 
01) encoding a cation transporter-like protein is shown in Table 13 A. Putative untranslated 
regions upstream and/or downstream from the coding region, if any, are underlined, and the 
start and stop codons are in bold letters. 



Table 13A. NOV13 nucleotide sequence (SEQ ID NO:35). 

GTTC AC C C CAAG ACT AAGTT CTTTC C CAA GTTAGA G AAGAAG AGAG AAAGC AAAAAGA AG 
AGAGGAAAGTTCTCCCTTCCCCTCCTCCGT GCCTGTCATGTCCTCTAAGCCAGAGCCGAA 
GGACGTCCACCAACTGAACGGGACTGGCCCTTCTGCCTCTCCCTGCTCTTCAGATGGCCC 
AGGGAGAGAGCCCTTGGCTGGGACCTCAGAGTTCCTGGGGCCTGATGGGGCTGGGGTAGA 
GGTGGTGATTGAGTCTCGGGCCAACGCCAAGGGGGTTCGGGAGGAGGACGCCCTGCTGGA 
GAACGGGAGCCAGAGCAACGAAAGTGACGACGTCAGCACAGACCGTGGCCCTGCGCCACC 
TTCCCCGCTCAAGGAGACCTCCTTTTCCATCGGGCTGCAAGTACTGTTTCCATTCCTCCT 
GGCAGGCTTTGGGACCGTGGCTGCTGGCATGGTGTTGGACATCGTGCAGCACTGGGAAGT 
CTTCCAGAAGGTGACAGAGGTCTTCATCCTAGTGCCTGCGCTGCTGGGGCTCAAAGGGAA 
CCTGGAAATGACCCTGGCATCAAGGCTTTCCACTGCAGCGAGTATCAACATTGGACACAT 
GGACACACCCAAGGAGCTCTGGCGGATGATCACTGGGAACATGGCCCTCATCCAGGTGCA 
GGCCACGGTGGTGGGCTTCCTGGCGTCCATCGCAGCCGTCGTCTTTGGCTGGATCCCTGA 
TGGCCACTTCAGTATTCCGCACGCCTTCCTGCTCTGTGCTAGCAGCGTGGCCACAGCCTT 
CATTGCCTCCCTGGTACTGGGTATGATCATGATTGGAGTCATCATTGGCTCTCGCAAGAT 
TGGGATCAACCCAGACAATGTGGCCACACCCATTGCTGCCAGCCTGGGCGACCTCATCAC 
CTTGGCGCTGCTCTCAGGCATCAGCTGGGGACTCCTGACCTCTGCCCTCTCAGATCACTG 
GCGATACATCTACCCACTGGTGTGTGCTTTCTTTGTGGCCCTGCTGCCTGTCTGGGTGGT 
GCTGGCCCGACGAAGTCCAGCCACAAGGGAGGTGTTGTACTCGGGCTGGGAGCCTGTTAT 
CATTGCCATGGCCATCAGCAGTGTGGGAGGCCTCATCTTGGACAAGACTGTCTCAGACCC 
CAACTTTGCTGGGATGGCTGTCTTCACGCCTGTGATTAATGGTGTTGGGGGCAATCTGGT 
GGCAGTGCAGGCCAGCCGCATCTCCACCTTCCTGCACATGAATGGAATGCCCGGAGAGAA 
CTCTGAGCAAGCTCCTCGCCGCTGTCCCAGTCCTTGTACCACCTTCTTCAGCCCTGGTGT 
GAATTCTCGCTCAGCCCGGGTCCTCTTCCTCCTCGTGGTCCCAGGACACCTGGTGTTCCT 
CTACACCATCAGCTGTATGCAGGGCGGGCACACCACCCTCACACTCATCTTCATCATCTT 
CTATATGACAGCTGCACTGCTCCAGGTGCTGATTCTCCTGTACATCGCAGACTGGATGGT 
GCACTGGATGTGGGGCCGGGGCCTGGACCCGGACAACTTCTCCATCCCATACTTGACTGC 
TCTGGGGGACCTGCTTGGCACTGGGCTCCTAGCACTCAGCTTCCATGTTCTCTGGCTCAT 
AGGGGACCGAGACACGGATGTCGGGGACTAG CTTGGTCACTCAACATTTTCCCCATCCCT 
CTGCACTTTCTATTTGAAATTTTTCTTTTGTTCCCCTGTCCCTCCTCCACCCCACACTCC 
CACCTCTT ~" ~ ~ "~ ™ " ' ' 



In a search of public sequence databases, the NOV 13 nucleic acid sequence, located on 

chromsome 1 has 440 of 674 bases (65%) identical to a gb:GENBANK- 

ID:AK021925|acc:AK021925.1 mRNA from Homo sapiens (Homo sapiens cDNA FLJ1 1863 
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fis ? clone HEMBA1 006926). Public nucleotide databases include all GenBank databases and 
the GeneSeq patent database. 

The disclosed NOV13 polypeptide (SEQ ID NO:36) encoded by SEQ ID NO:35 has 
517 amino acid residues and is presented in Table 13B using the one-letter amino acid code. 
5 Signal P, Psort and/or Hydropathy results predict that NOV 13 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 13B. Encoded NOV13 protein sequence (SEQ ID NO:36). 

MSSKPEPKDVHQLNGTGPSASPCSSDGPGREPLAGTSEFLGPDGAGVEWIESRANAKGV 
REEDALLENGSQSNESDDVSTDRGPAPPSPLKETSFSIGLQVLFPFLLAGFGTVAAGMVL 
DIVQHWEVFQKVTEVFILVPALLGLKGNLEMTLASRLSTAASINIGHMDTPKELWRMITG 
NMALIQVQATWGFLASIAAWFGWIPDGHFSIPHAFLLCASSVATAFIASLVLGMIMIG 
VIIGSRKIGINPDNVATPIAASLGDLITLALLSGISWGLLTSALSDHWRYIYPLVCAFFV 
ALLP VWVVLARRS PATRE VL YS GWE PV 1 1 AMAI S S VGGL I LDKTVS DPNFAGMAVFTPVI 
NGVGGNLVAVQASRISTFLHMNGMPGENSEQAPRRCPSPCTTFFSPGVNSRSARVLFLLV 
VPGHLVFL YT I S CMQGGHTTLTL I F 1 1 F YMTAALLQVL ILLY I ADWMVHWMWGRGLDPDN 
FS I P YLTALGDLLGTGLLALS FHVLWL I GDRDTDVGD 



A search of sequence databases reveals that the NOV 13 amino acid sequence has 307 
of 456 amino acid residues (67%) identical to, and 373 of 456 amino acid residues (81%) 
1 0 similar to, the 490 amino acid residue ptnr:TREMBLNEW-ACC:CAB66762 protein from 
Homo sapiens (Human) (HYPOTHETICAL 53.3 KDA PROTEIN). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV13 is expressed in at least Liver, Pituitary Gland, Heart, Uterus, and B-cells. This 
information was derived by determining the tissue sources of the sequences that were included 
15 in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV 13 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 1 3C. 



Table 13C. BLAST results for NOV13 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l4149819|ref |NP 


hypothetical protein 
DKFZp434K0427 
[Homo sapiens] 


490 


305/457 
(66%) 


372/457 
(80%) 


e-166 


115524 .l| 
(NM_032148) 


gi | 12 0 53165 | emb | CAB 


hypothetical 
protein [Homo 
sapiens] 


490 


305/457 
(66%) 


372/457 
(80%) , 


e-166 


66762. l| (AL136828) 


gi | 15079232 | gb | AAH1 


protein for 
MGC: 18986) [Mus 
musculus] 


488 


235/438 
(53%) 


306/438 
(69%) , 


e-112 


1108.1 |AAH11108 


(BC011108) 
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qi | 14290540 |qb|AAHO 


Similar to 
hypothetical 
protein FLJ20473 
[Homo sapiens] 


507 


237/438 
(54%) 


306/438 
(69%) , 


e-111 


9039.l|AAH09039 


(BC009039 


qi|l2833620|dbi | BAB 


homo log to CDNA 
FLJ12718 FIS, 
CLONE 

NT2RP1001286~puta 
tive [Mus 
musculus] 


462 


230/445 
(51%) 


304/445 
(67%) , 


e-109 


22598. 1| (AK003140) 



Table 13D lists the domain descriptions from DOMAIN analysis results against 
NOV 13. This indicates that the NOV 13 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 13D. Domain Analysis of NOV13 

gnl I Pf am [ pf am01769 , MgtE, Divalent cation transporter. This region is 
the integral membrane part of the eubacterial MgtE family of magnesium 
transporters. Related regions are found also in archaebacterial and 
eukaryotic proteins. All the archaebacterial and eukaryotic examples 
have two copies of the region. This suggests that the eubacterial 
examples may act as dimers . Members of this family probably transport 
Mg2+ or other divalent cations into the cell. The alignment contains 
two highly conserved Ds that may be involved in cation binding 
(Bateman A unpubl . ) 

CD-Length = 131 residues, 99.2% aligned 

Score = 66.6 bits (161), Expect = 3e-12 



A gene family encoding many Na(+)- and Cl(-)-dependent organic solute 
cotransporters has recently been recognized. Among the cotransporters that have been 
characterized are those for neurotransmitters, amino acids, and organic osmolytes. The cDNA 

10 is 2,354 bp long with an open reading frame of 1,845 bp. The 615 deduced amino sequence 
shows ROSIT to be most clearly related to two orphan cDNAs of this family isolated from 
brain. Northern analysis showed the mRNA is normally expressed in renal cortex but not in 
brain, heart, colon, liver, stomach, or skeletal muscle. Moreover, hypernatremic rats displayed 
a marked increase in mRNA levels in renal cortex, renal outer medulla, and perhaps intestine. 

15 Heterologous expression of the cRNA in Xenopus laevis oocytes failed to reveal the function 
of this gene product when analyzed with isotope fluxes or electrophysiological measurements 
using a wide variety of organic solutes. ROSIT is likely to be involved in kidney reclamation 
of an organic osmolyte or osmolyte precursor required for adaptation to hypertonic stress. (See 
Wasserman et al., Am J Physiol 1994 Oct;267(4 Pt 2):F688-94). 

20 Nrampl regulates macrophage activation in infectious and autoimmune diseases. 

Nramp2 controls anaemia. Both are divalent cation (Fe(2+), Zn(2+), and Mn(2+)) transporters; 
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Nramp2 a symporter of H(+) and metal ions, Nrampl a H(+)/divalent cation antiporter. This 
provides a model for metal ion homeostasis in macrophages. Nramp2, localised to early 
endosomes, delivers extracellularly acquired divalent cations into the cytosol. Nrampl, 
localised to late endosomes/lysosomes, delivers divalent cations from the cytosol to 
5 phagolysosomes. Here, Fe(2+) generates antimicrobial hydroxyl radicals via the Fenton 
reaction. Zn(2+) and Mn(2+) may also influence endosomal metalloprotease activity and 
phagolysosome fusion. The many cellular functions dependent on metal ions as cofactors may 
explain the multiple pleiotropic effects of Nrampl, and its complex roles in infectious and 
autoimmune disease. (See Blackwell et al., Microbes Infect 2000 Mar;2(3):317-21). 

10 Mutations in the gene encoding the renal epithelial K(+) channel ROMK1 (Kir 1.1 )is 

one of the causes for Bartter's syndrome, an autosomal recessive disease. It results in defective 
renal tubular transport in the thick ascending limb of the loop of Henle that leads to 
hypokalemic metabolic alkalosis and loss of salt. Two novel ROMK1 mutations, 
L220F/A156V, have been described recently in a compound heterozygote patient 

15 demonstrating typical manifestations of Bartter's syndrome. Functional properties of these 
ROMK1 mutants were studied by coexpressing in Xenopus oocytes and by means of double 
electrode voltage clamp experiments. When both ROMK1 mutants were coexpressed no K(+) 
conductance could be detected. The same was found in oocytes expressing A156V-ROMK1 
only or coexpressing wild type (wt) ROMK1 together with A156V-ROMK1. In contrast, K(+) 

20 conductances were indistinguishable from that of wt-ROMKl when L220F-ROMK1 was 
expressed alone. Activation of protein kinase C signaling inhibited the conductance in both 
L220F-ROMK1 and wt-ROMKl expressing oocytes. These effects were not seen in A156V- 
ROMK1 expressing oocytes. 

The disclosed NOV 13 nucleic acid of the invention encoding a cation transporter-like 
25 protein includes the nucleic acid whose sequence is provided in Table 13A or a fragment 

thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 1 3 A while still encoding a protein 
that maintains its cation transporter-like activities and physiological functions, or a fragment 
of such a nucleic acid. The invention further includes nucleic acids whose sequences are 
30 complementary to those just described, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
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derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 35 percent of the bases may be so changed. 

The disclosed NOV 13 protein of the invention includes the cation transporter-like 
protein whose sequence is provided in Table 13B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table B while still encoding a protein that maintains its cation transporter-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 33 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this cation transporter- 
like protein (NOV13) may function as a member of a "cation transporter family". Therefore, 
the NOV 13 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV 13 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the cation transporter-like 
protein (NOV 13) may be useful in gene therapy, and the cation transporter-like protein 
(NOV 13) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from cancer,trauma, regeneration (in vitro and in vivo), 
viral/bacterial/parasitic infections, cardiomyopathy, atherosclerosis, hypertension, congenital 
heart defects, aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, 
ductus arteriosus, pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), 
valve diseases, tuberous sclerosis, scleroderma, obesity, transplantation, endometriosis, 
fertility, Von Hippel-Lindau (VHL) syndrome, cirrhosis, endocrine dysfunctions, diabetes, 
obesity, growth and reproductive disorders, or other pathologies or conditions. The NOV 13 
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nucleic acid encoding the cation transporter-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV 13 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 13 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 13 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV14 

A disclosed NOV14 nucleic acid of 5175 nucleotides (also referred to as CG57593-01) 
1 5 encoding a ABC transporter-like protein is shown in Table 14 A. Putative untranslated regions 
upstream and/or downstream from the coding region, if any, are underlined, and the start and 
stop codons are in bold letters. 



Table 14A. NOV14 nucleotide sequence (SEQ ID NO:37). 



AATGTTCAAAGGCTTTCTGTAACTGAACTT TTTTTTT TTCTTTTTTCCCCTAG CTATTGC 
TCTGCAATATTTACTTTACCCTGTTAATGA ACAGGACAAAATGGTTAAAAAA.GAGATAAG 
CGTGCGTCAACAAATTCAGGCTCTTCTGTACAAGAATTTTCTT7VAAAAATGGAGAATAAA 
AAGAGAGCTGGAGGAATGGACAATAACATTGTTTCTAGGGCTATATTTGTGCATCTTTTC 
GGAACACTTCAGAGCTACCCGTTTTCCTGAACAACCTCCTAAAGTCCTGGGAAGCGTGGA 
TCAGTTTAATGACTCTGGCCTGGTAGTGGCATATACACCAGTCAGTAACATAACACAAAG 
GATAATGAATAAGATGGCCTTGGCTTCCTTTATGAAAGGTAGAACAGTCATTGGGACACC 
AGATGAAGAGACCATGGATATAGAACTTCCAAAAAAATACCATGAAATGGTGGGAGTTAT 
ATTTAGTGATACTTTCTCATATCGCCTGAAGTTTAATTGGGGATATAGAATCCCAGTTAT 
AAAGGAGCACTCTGAATACACAGGTCACTGTTGGGCCATGCATGGTGAAATTTTTTGTTA 
CTTGGCAAAGTACTGGCTAAAAGGGTTTGTAGCTTTTCAAGCTGCAATTAATGCTGCAAT 
TATAGAAGTAAGTACAACAAATCATTCTGTAATGGAGGAGTTGACATCAGTTATTGGAAT 
AAATATGAAGATACCACCTTTCATTTCTAAGGGAGAAATTATGAATGAATGGTTTCATTT 
TACTTGCTTAGTTTCTTTCTCTTCTTTTATATACTTTGCATCATTAAATGTTGCAAGGGA 
AAGAGGAAAATTTAAGAAACTGATGACAGTGATGGGTCTCCGAGAGTCAGCATTCTGGCT 
CTCCTGGGGATTGACATACATTTGCTTCATCTTCATTATGTCCATTTTTATGGCTCTGGT 
CATAACATCAATCCCAATTGTATTTCATACTGGCTTCATGGTGATATTCACACTCTATAG 
CTTATATGGCCTTTCTTTGGTGTTGGCTTTCCTCATGAGTGTTTTAATAAGGAAACCTAT 
GCTCGCTGGTTTGGCTGGATTTCTCTTCACTGTATTTTGGGGATGTCTGGGATTCACTGT 
GTTATATAGACAACTTCCTTTATCTTTGGGATGGGTATTAAGTCTTCTTAGCCCTTTTGC 
CTTCACTGCTGGAATACAGATTACACACCTGGATAATTACTTAAGTGGTGTTATTTTTCC 
TGATCCCTCTGGGGATTCATACAAAATGATAGCCACTTTTTTCATTTTGGCATTTGATAC 
TCTTTTCTATTTGATATTCACATTATATTTTGAGCGAGTTTTACCTGGTAAGGGCCATGG 
GGATTCTCCATTATTTTTCCTTAAGTCCTCATTTTGGTCCAAACATCAAAATACTCATCA 
TGAAATCTTTGAGAATGAAATAAATCCTGAGCATTCCTCTGATGATTCTTTTGAACCGGT 
GTCTCCAGAATTCCATGGAAAAGAAGCCATAAGGATCAGAAATGTTATAAAAGAATATAA 
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TGGAAAGACTGGAAAAGTAGAAGCATTGCAAATATTTTTTGACATATATGAAGGACAGAT 
CACTGCAATACTTGGGCATAATGGAGCTGGTAAATCAACACTGCTAAACATTCTTAGTGG 
ATTGTCTGTTTCTACAGAAGGTTCAGCCACTATTTATAATACTCAACTCTCTGAAATAAC 
TGACATGGAAGAAATTAGAAAGAATATTGGATTTTGTCCACAGTTCAATTTTCAATTTGA 
CTTCCTCACTGTGAGAGAAAACCTCAGGGTATTTGCTAAAATAAAAGGGATTCAGCCAAA 
GGAAGTGGAACAAGAGGTATTGCTGCTAGATGAACCAACTGCTGGATTGGATCCCTTTTC 
AAGACACCGAGTGTGGAGCCTCCTGAAGGAGCATAAAGTAGACCGACTTATCCTCTTCAG 
TACCCAATTCATGGATGAGGCTGACATCTTGGCTGATAGGAAAGTATTTCTGTCTAATGG 
GAAGTTGAAATGTGCAGGATCATCTTTGTTTCTGAAGCGAAAGTGGGGTATTGGATATCA 
TTTAAGTTTACACAGGAATGAAATGTGTGACACAGAAAAAATCACATCCCTTATTAAGCA 
GCACATTCCTGATGCCAAGTTAACAACAGAAAGTGAAGAAAAACTTGTATATAGTTTGCC 
TTTGGAAAAAACGAACAAATTTCCAGATCTTTACAGTGACCTTGATAAGTGTTCTGACCA 
GGGCATAAGGAATTATGCTGTTTCAGTGACATCTCTGAATGAAGTATTCTTGAACCTAGA 
AGGAAAATCAGCAATTGATGAACCAGGTATATTTGACATTGGGAAACAAGAGAAAATACA 
TGTGACAAGAAATACTGGAGATGAGTCTGAAATGGAACAGGTTCTTTGTTCTCTTCCTGA 
AACAAGAAAGGCTGTCAGTAGTGCAGCTCTCTGGAGCCGACAAATCTATGCAGTGGCAAC 
ACTTCGCTTCTTAAAGTTAAGGCGTGAAAGGAGAGCTCTTTTGTGTTTGTTACTAGTACT 
TGGAATTGCTTTTATCCCCATCATTCTAGAGAAGATAATGTATAAAGTAACTCGTGAAAC 
TCATTGTTGGGAGTTTTCACCCAGTATGTATTTCCTTTCTCTGGAACAAATCCCGAAGAC 
GCCTCTTACCAGCCTGTTAATCGTTAATAATACAGGTTCAAATATTGAAGACCTCGTGCA 
TTCACTGAAGTGTCAGGATATAGTTTTGGAAATAGATGACTTTAGAAACAGAAATGGCTC 
AGATGATCCCTCCTACAATGGAGCCATCATAGTGTCTGGTGACCAGAAGGATTACAGATT 
TTCAGTTGCATGTAATACCAAGAAATCGAATTGTTTTCCGGTTCTTATGGGAATTGTTAG 
CAATGCCCTTATTGGAATTTTTAACTTCACAGAGCTTATTCAAATGGAGAGCACCTTCAT 
TTTTCGTGATGACATAGTGCTGGATCTTGGTTTTATAGATGGGTCCATATTTTTGTTGTT 
GATCAC AAACTG C ATTTC TCC TT ATATTGGC ATAACAGCATCAGTGATT ATTAAAGTAAG 
AGGGAGAGAGAGGTCCCAGTTATGGATTTCAGGCCTCTGGCCTTCAGCATACTGGTGTGG 
ACAGGCTCTGGTGGACATTCCATTATACTTCTTGATTCTCTTTTCAATACATTTAATTTA 
CTACTTCATATTTCTGGGATTCCAGCTTTCATGGGAACTCATGTTTGTTTTGGTAAGTGA 
TCCATTATTTGCAGGTGGTATGCATAATTGGTTGTGCAGTTTCTCTTATATTCCTCACAT 
ATGTGCTTTCATTCATCTTTCGCAAGTGGAGAAAAAAAATGGCTTTTGGTCTTTTGGCTT 
TTTTATTGTAAGTATATATACATGCGTGCACATTTATATTAAATTTTATTTGCTTGATAA 
ATCTTTTTTGCCACTTGTGTTTACTTTTAACTTTTATTGTTCTTATGCTTTAATGCCTGT 
CTCTTGTAAATCTGTCTTACTTTTTGCTTTTTTATTTCTCCAAAAATATCTCCATATAGC 
TCCAATCCTTTTCCCTCTGTTTGCTTTTGTTAGTGTTATTTTCCTTTTTGTCATAAGGTG 
TCTGGAAATGAAGTATGGAAATGAAATAATGAATAAAGACCCAGTTTTCAGGATCTCTCC 
ACGGAGTAGAGAAACTCATCCCAATCCGGAAGAGCCCGAAGAAGAAGATGAAGATGTTCA 
AGCTGAAAGAGTCCAAGCAGCAAATGCACTCACTGCTCCAAACTTGGAGGAGGAACCAGT 
CATAACTGCAAGCTGTTTACACAAGGAATATTATGAGACAAAGAAAAGTTGCTTTTCAAC 
AAGAAAGAAGAAAATAGCCATCAGAAATGTTTCCTTTTGTGTTAAAAAAGGTGAAGTTTT 
GGGATTACTAGGACACAATGGAGCTGGTAAAAGTACTTCCATTAAAATGATAACTGGGTG 
CACAAAGCCAACTGCAGGAGTGGTAGTGGTGTTACAAGGCAGCAGAGCATCAGTAAGGCA 
ACAGCATGACAACAGCCTCAAGTTCTTGGGGTACTGCCCTCAGGAGAACTCACTGTGGCC 
CAAGCTTACAATGAAAGAGCACTTGGAGTTGTATGCAGCTGTGAAAGGACTGGGCAAAGA 
AGATGCTGCTCTCAGTATTTCAAGATTGGTGGAAGCTCTTAAGCTCCAGGAACAACTTAA 
GGCTCCTGTGAAAACTCTATCAGAGGGAATAAAGAGAAAGCTGTGCTTTGTGCTGAGCAT 
CCTGGGGAACCCATCAGTGGTGCTTCTAGATGAGCCGTTCACCGGGATGGACCCCGAGGG 
GCAGCAGCAAATGTGGCTTCAGGCTACCGTTAAAAACAAGGAGAGGGGCACCCTCTTGAC 
CACCCATTACATGTCAGAGGCTGAGGCTGTGTGTGACCGTATGGCCATGATGGTGTCAGG 
AACGCTAAGGAGGTGTATTGGTTCCATTCAACATCTGAAAAACAAGTTTGGTAGAGATTA 
TTT AC TAG AAATAAAAATGAAAGAAC CTAC C C AGGTGG AAG C T C T C CAC AC AGAG ATTTT 
GAAGCTTTTCCCACAGGCTGCTTGGCAGGAAAGATATTCCTCTTTAATGGCGTATAAGTT 
ACCTGTGGAGGATGTCCACCCTCTATCTCGGGCCTTTTTCAAGTTAGAGCGAGTGAAGCA 
GACCTTCAACCTGGAGGAATACAGCCTCTCTCAGGCTACCTTGGAGCAGGTGTTCTTAGA 
ACTCTGTAAAGAGCAGGAGCTGGGAAATGTTGATGATAAAATTGATACAACAGTTGAATG 
GAAACTTCTCCCACAGGAAGACCCTTA AAATGAAGAACCTCCTAACATTCAATTTTAGGT 
CCTACTACATTGTTAGTTTCCATAATTCTACAAGAATGTTTCCTTTTACTTCA GTTAACA 
AAAGAAAACATTTAATAAACATTCAATAATGATTACAGTTTTCATTTTTAAAAATTTAGG 
ATGAAGGAAACAAGGAAATATAGGGAAAAGTAGTAGACAAAATTAACAAAATCAGACATG 
TTATTCATCCCCAACATGGGTCTATTTTGTGCTTAAAAATAATTTAAAAATCATACAATA 
TTAGGTTGGTTATCG 
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In a search of public sequence databases, the NOV 14 nucleic acid sequence, located on 
chromsome 17 has 1737 of 2520 bases (68%) identical to a gb:GENBANK- 
ID:AB020629|acc:AB020629.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
KIAA0822 protein, complete cds). Public nucleotide databases include all GenBank databases 
5 and the GeneSeq patent database. 

The disclosed NOV14 polypeptide (SEQ ID NO:38) encoded by SEQ ID NO:37 has 
1595 amino acid residues and is presented in Table 14 using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 14 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.8000. The most likely 
10 cleavage site for a NOV 14 peptide is between amino acids 52 and 53. 



Table 14B. Encoded NOV14 protein sequence (SEQ ID NO:38). 

MVKKE I S VRQQ IQALL YKNFLKKWR IKRE LEE WT I TLFLGL YL C I FS EHFRATRFPEQP P 
KVLGSVDQFNDSGLWAYTPVSNITQRIMNKMAIASFMKGRTVIGTPDEETMDIELPKKY 
HEMVGVI F SDTFS YRLKFNWGYRI PVI KEHSE YTGHCWAMHGE I FCYLAKYWLKGFVAFQ 
AAINAAIIEVSTTNHSVMEELTSVIGINMKIPPFISKGEIMNEWFHFTCLVSFSSFIYFA 
SLNVARERGKFKKLMTVMGLRES AFWLS WGLTY I CFI F IMS I FMALVITS I P I VFHTGFM 
VIFTLYSLYGLSLVIAFLMSVLIRKPMLAGIAGFLFWFWGCLGFTVLYRQLPLSLGWVX, 
S LLS P FAFTAG I Q I THLDNYLS GVI FPDP SGDS YKMI ATFF ILAFDTLF YL I FTL YFERV 
LPGKGHGDSPLFFLKSSFWSKHQNTHHEIFENEINPEHSSDDSFEPVSPEFHGKEAIRIR 
NVIKEYNGKTGKVEALQIFFDIYEGQITAILGHNGAGKSTLLNILSGLSVSTEGSATIYN 
TQLSEITDMEEIRKNIGFCPQFNFQFDFLTVRENLRVFAKIKGIQPKEVEQEVLLLDEPT 
AGLDPFSRHRVWSLLKEHKVDRLILFSTQFMDEADILADRKVFLSNGKLKCAGSSLFLKR 
KWGIGYHLSLHRNEMCDTEKITSLIKQHIPDAKLTTESEEKLVYSLPLEKTNKFPDLYSD 
LDKCSDQGIRNYAVSVTSLNEVFLNLEGKSAIDEPGIFDIGKQEKIHVTRNTGDESEMEQ 
VLCSLPETRKAVSSAALWSRQIYAVATLRFLKLRRERRALLCLLLVLGIAFIPIILEKIM 
YKVTRETHCWEFS PSMYFLS LEQI PKTPLTSLL I VNNTGSNI EDLVHSLKCQD I VLE IDD 
FRNRNGSDDPSYNGAIIVSGDQKDYRFSVACNTKKSMCFPVLMGIVSNALIGIFNFTELI 
QME S T F I FRDD I VLDLG F IDGSIFLLLI TNC I S P Y I G I TAS V 1 1 KVRGRER S QLW I S GLW 
P S AYWCGQALVD I PL YFL I LFS I HL I YYF I FLGFQLS WELMFVLVS DPLFAGGMHNWLC S 
FSYIPHICAFIHLSQVEKKNGFWSFGFFIVSIYTCVHIYIKFYLLDKSFLPLVFTFNFYC 
S YALMP VS CKS VLLFAFLFLQK YLH I AP I LFPLFAFVS VI FLFVI RCLEMKYGNE IMNKD 
PVFRISPRSRETHPNPEEPEEEDEDVQAERVQAANALTAPNLEEEPVITASCLHKEYYET 
KKSCFSTRKKKIAIRNVSFCVKKGEVLGLLGHNGAGKSTSIKMITGCTKPTAGWWLQG 
SRASVRQQHDNSLKFLGYCPQENSLWPKLTMKEHLELYAAVKGLGKEDAALSISRLVEAL 
KLQEQLKAPVKTLSEGIKRKLCFVLS I LGNPS WLLDEPFTGMDPEGQQQMWLQATVKNK 
ERGTLLTTHYMSEAEAVCDRMAMMVSGTLRRCIGSIQHLKNKFGRDYLLEIKMKEPTQVE 
ALHTEILKLFPQAAWQERYSSLMAYKLPVEDVHPLSRAFFKLERVKQTFNLEEYSLSQAT 
LEQVFLELCKEQELGNVDDKIDTTVEWKLLPQEDP 



A search of sequence databases reveals that the NOV 14 amino acid sequence has 747 
of 1321 amino acid residues (56%) identical to, and 951 of 1321 amino acid residues (71%) 
similar to, the 1581 amino acid residue ptnr:SPTREMBL-ACC:09491 1 protein from Homo 
1 5 sapiens (Human) (KIAA0822 PROTEIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV 14 is expressed in at least epidermis. This information was derived by 

determining the tissue sources of the sequences that were included in the invention including 
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but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV 14 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 14C. 



Table 14C. BLAST results for NOV14 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 




ATP-binding 
cassette, sub- 
family A (ABCl) , 
member 10 [Homo 
sapiens] 


1543 


1391/1563 
(88%) 


1423/1563 
(90%) 


0.0 


7 1 1 
(XM_085647 


gi|l7933760|ref |NP 


ATP-binding 
cassette, sub- 
family A (ABCl) , 
member 10 [Homo 
sapiens] 


1543 


1387/1563 
(88%) 


1420/1563 
(90%) , 


0.0 


525021. 1| 
(NM_080282) 


gi j 6005701 | ref |NP 0 


ATP-binding 
cassette, sub- 
family A (ABCl) , 
member 8 [Homo 
sapiens] 


1581 


943/1604 
(58%) 


1183/1604 
(72%) 


0.0 


09099. 1| 
(NM_007168) 


gi|l7933764|ref |NP 


ATP-binding 
cassette, sub- 
family A (ABCl) , 
member 6 [Homo 
sapiens] 


1617 


951/1659 
(57%) 


1173/1659 
(70%) , 


0.0 


525023. l| 
(NMJD80284) 


gi 1 18587273 |ref|XP 


ATP-binding 
cassette, sub- 
family A (ABCl) , 
member 9 (Homo 
sapiens] 


1624 


940/1653 
(56%) 


1189/1653 
(71%) , 


0.0 


085646. l| 


(XM_085646) 



Table 14D lists the domain descriptions from DOMAIN analysis results against NOV 
14. This indicates that the NOV 14 sequence has properties similar to those of other proteins 
known to contain this domain. 
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Table 14D. Domain Analysis of NO VI 4 

gnl 1 Pf am |pfam00005, ABC_tran, ABC transporter. ABC transporters for a 
large family of proteins responsible for translocation of a variety of 
compounds across biological membranes. ABC transporters are the 
largest family of proteins in many completely sequenced bacteria. ABC 
transporters are composed of two copies of this domain and two copies 
of a transmembrane domain pfam00664. These four domains may belong to 
a single polypeptide, or belong in different polypeptide chains. 

CD-Length = 183 residues, 98.4% aligned 

Score = 132 bits (331) , Expect = 2e-31 
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Table 14E. Domain Analysis of NOV14 

gnl | Sma rt | smart00382 , AAA, ATPases associated with a variety of 
cellular activities; AAA - ATPases associated with a variety of 
cellular activities. This profile/alignment only detects a fraction of 
this vast family. The poorly conserved N- terminal helix is missing 
from the alignment. 

CD-Length = 151 residues, 98.7% aligned 

Score = 49.3 bits (116), Expect = 2e-06 



ABCAI, a member of the ATP binding cassette family, mediates the efflux of excess 
cellular lipid to HDL and is defective in Tangier disease. The apolipoprotein acceptor 
specificity for lipid efflux by ABCAI was examined in stably transfected Hela cells, 
5 expressing a human ABCAI-GFP fusion protein. ApoA-I and all of the other exchangeable 
apolipoproteins tested (apoA-II, apoA-IV, apoC-I, apoC-II, apoC-III, apoE) showed greater 
than a threefold increase in cholesterol and phospholipid efflux from ABCAI-GFP transfected 
cells compared to control cells. Expression of ABCAI in Hela cells also resulted in a marked 
increase in specific binding of both apoA-I (Kd = 0.60 ?g/mL) and apoA-II (Kd = 0.58 ?g/mL) 
10 to a common binding site. In summary, ABCAI-mediated cellular binding of apolipoproteins 
and lipid efflux is not specific for only apoA-I but can also occur with other apolipoproteins 
that contain multiple amphipathic helical domains. (See Remaley et ah, Biochem Biophys Res 
Commun 2001 Jan 26;280(3):8 18-823). 

The molecular mechanisms regulating the amount of dietary cholesterol retained in the 
15 body, as well as the body's ability to exclude selectively other dietary sterols, are poorly 

understood. An average western diet will contain about 250-500 mg of dietary cholesterol and 
about 200-400 mg of non-cholesterol sterols. About 50-60% of the dietary cholesterol is 
absorbed and retained by the normal human body, but less than 1 % of the non-cholesterol 
sterols are retained. Thus, there exists a subtle mechanism that allows the body to distinguish 
20 between cholesterol and non-cholesterol sterols. In sitosterolemia, a rare autosomal recessive 
disorder, affected individuals hyperabsorb not only cholesterol but also all other sterols, 
including plant and shellfish sterols from the intestine. The major plant sterol species is 
sitosterol; hence the name of the disorder. Consequently, patients with this disease have very 
high levels of plant sterols in the plasma and develop tendon and tuberous xanthomas, 
25 accelerated atherosclerosis, and premature coronary artery disease. The STSL locus was 

mapped to human chromosome 2p21 (ref. 4) and was localized it to a region of less than 2 cM 
bounded by markers D2S2294 and D2S2291 . A new member of the ABC transporter family, 
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ABCG5, is mutant in nine unrelated sitosterolemia patients. (See Lee et al., Nat Genet 2001 
Jan;27(l):79-83). 

Pseudoxanthoma elasticum (PXE) is an inherited systemic disorder of connective 
tissue, characterized by progressive calcification of the elastic fibers in the eye, the skin, and 
5 the cardiovascular system. The PXE locus has been mapped to chromosome 16pl3.1, and was 
recently further refined to a 500 kb-region, containing four candidate genes : MRP1 (ABCC1), 
MRP6 (ABCC6), pM5, and two copies of an unknown gene, the later subsequently found to 
be identical to the gene encoding the Nuclear Pore Interacting Protein (NPIP). In a 
comprehensive mutational screening, the entire coding region of the pM5, MRP1, and NPIP 
1 0 genes were analyzed in 7 patients affected with pseudoxanthoma elasticum, but failed to find 
evidence of disease-causing defects in any of these three genes. Five synonymous (G232G, 
P395P, A862A, G912G, Dl 106D), and five non synonymous (V404I, N458K, D490N, 
Fl 1411, Gl 195R) polymorphisms were found in the pM5 gene, 

Mutations in the gene encoding ABCR (ABCA4), a photoreceptor-specific ATP- 
1 5 binding cassette (ABC) transporter, are responsible for autosomal recessive Stargardt disease 
(STGD), an early onset macular degeneration, and some forms of autosomal recessive cone- 
rod dystrophy and autosomal recessive retinitis pigmentosa. Heterozygosity for ABCA4 
mutations may also represent a risk factor for age-related macular degeneration (AMD). An 
ongoing challenge in the analysis of ABCA4-based retinopathies arises from the observation 
20 that most of the ABCA4 sequence variants identified so far are missense mutations that are 
rare in both patient and control populations. With the current sample size of most sequence 
variants, one cannot determine statistically whether a particular sequence variant is pathogenic 
or neutral. A related challenge is to determine the degree to which each pathogenic variant 
impairs ABCR function, as genotype-phenotype analyses indicate that age of onset and disease 
25 severity correlate with different ABCA4 alleles. To address these questions, a functional 

analysis of human ABCR and its variants was performed. These experiments reveal a wide 
spectrum of biochemical defects in these variants and provide insight into the transport 
mechanism of ABCR. (See Sun et al., Nat Genet 2000 Oct;26(2):242-6). 

The disclosed NOV 14 nucleic acid of the invention encoding a ABC transporter-like 

30 protein includes the nucleic acid whose sequence is provided in Table 14A or a fragment 

thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 

be changed from the corresponding base shown in Table 14A while still encoding a protein 

that maintains its ABC transporter-like activities and physiological functions, or a fragment of 

such a nucleic acid. The invention further includes nucleic acids whose sequences are 
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complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
5 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 32 percent of the bases may be so changed. 

10 The disclosed NOV 14 protein of the invention includes the ABC transporter-like 

protein whose sequence is provided in Table 14B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 14B while still encoding a protein that maintains its ABC transporter-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 

15 up to about 44 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this ABC transporter- 
like protein (NOV 14) may function as a member of a "ABC transporter family". Therefore, 

20 the NOV 14 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

25 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV 14 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the ABC transporter-like 

30 protein (NOV14) may be useful in gene therapy, and the ABC transporter- like protein 
(NOV 14) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from cancer, trauma, regeneration (in vitro and in vivo), 
viral/bacterial/parasitic infections, psoriasis, actinic keratosis, tuberous sclerosis, acne, hair 
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growth/loss, allopecia, pigmentation disorders, endocrine disorders, or other pathologies or 
conditions. The NOV 14 nucleic acid encoding the ABC transporter-like protein of the 
invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV 14 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 14 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 14 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 



A disclosed NOV 15 nucleic acid of 2540 nucleotides (also referred to as CG57652-01) 
encoding a diacylglycerol kinase alpha-like protein is shown in Table 1 5 A. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
underlined, and the start and stop codons are in bold letters. 



GGGGCGG TCGCAGC TGAAGCAGGCCTACCCTCTGAAGAGGTCCAAGCAACGGAAGTACTA 
CTACGAAGCTGCCTT TCTGGCC ATCCTTGAGAAAAATAGACAG ATGGCCAAGGAGAGGGG 
CCTAATAAGCCCCAGTGATTTTGCCCAGCTGCAAAAATACATGGAATACTCCACCAAAAA 
GGTCAGTGATGTCCTAAAGCTCTTCGAGGATGGCGAGATGGCTAAATATGTCCAAGGAGA 
TGCCATTGGGTACGAGGGATTCCAGCAATTCCTGAAAATCTATCTCGAAGTGGATAATGT 
TCCCAGACACCTAAGCCTGGCACTGTTTCAATCCTTTGAGACTGGTCACTGCTTAAATGA 
GACAAATGTGACAAAAGATGTGGTGTGTCTCAATGATGTTTCCTGCTACTTTTCCCTTCT 
GGAGGGTGGTCGGCCAGAAGACAAGTTAGAATTCACCTTCAAGCTGTACGACACGGACAG 
AAATGGGATCCTGGACAGCTCAATGATGCGAGTGGCTGAATACCTGGATTGGGATGTGTC 
TGAGCTGAGGCCGATTCTTCAGGAGATGATGAAAGAGATTGACTATGATGGCAGTGGCTC 
TGTCTCTCAAGCTGAGTGGGTCCGGGCTGGGGCCACCACCGTGCCACTGCTAGTGCTGCT 
GGGTCTGGAGATGACTCTGAAGGACGACGGACAGCACATGTGGAGGCCCAAGAGGTTCCC 
CAGACCAGTCTACTGCAATCTGTGCGAGTCAAGCATTGGTCTTGGCAAACAGGGACTGAG 
CTGTAACCTCTGTAAGTACACTGTTCACGACCAGTGTGCCATGAAAGCCCTGCCTTGTGA 
AGTCAGCACCTATGCCAAGTCTCGGAAGGACATTGGTGTCCAATCACATGTGTGGGTGCG 
AGGAGGCTGTGAGTCCGGGCGCTGCGACCGCTGTCAGAAAAAGATCCGGATCTACCACAG 
TCTGACCGGGCTGCATTGTGTATGGTGCCACCTAGAGATCCACGATGACTGCCTGCAAGC 
GGTGGGCCATGAGTGTGACTGTGGGCTGCTCCGGGATCACATCCTGCCTCCATCTTCCAT 
CTATCCCAGTGTCCTGGCCTCTGGACCGGATCGTAAAAATAGCAAAACAAGCCAGAAGAC 
CATGGATGATTTAAATTTGAGCACCTCTGAGGCTCTGCGGATTGACCCTGTTCCTAACAC 
CCACCCACTTCTCGTCTTTGTCAATCCTAAGAGTGGCGGGAAGCAGGGGCAGAGGGTGCT 
CTGGAAGTTCCAGTATATATTAAACCCTCGACAGGTGTTCAACCTCCTAAAGGATGGTCC 
TGAGATAGGGCTCCGATTATTCAAGGATGTTCCTGATAGCCGGATTTTGGTGTGTGGTGG 
AGACGGCACAGTAGGCTGGATTCTAGAGACCATTGACAAAGCTAACTTGCCAGTTTTGCC 



disorders. 



NOV15 



Table 15A. NOV nucleotide sequence (SEQ ID NO:39). 
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TCCTGTTGCTGTGTTGCCCCTGGGTACTGGAAATGATCTGGCTCGATGCCTAAGATGGGG 

AGGAGGTTATGAAGGACAGAATCTGGCAAAGATCCTCAAGGATTTAGAGATGAGTAAAGT 

GGTACATATGGATCGATGGTCTGTGGAGGTGATACCTCAACAAACTGAAGAAAAAAGTGA 1 

CCCAGTCCCCTTTCAAATCATCAATAACTACTTCTCTATTGGCGTGGATGCCTCTATTGC 

TCATCGATTC CACATCATG CGAGAGAAATATC CGGAGAAGTTCAACAGCAGAATGAAGAA 

CAAGCTATGGTACTTCGAATTTGCCACATCTGAATCCATCTTCTCAACATGCAAAAAGCT 

GGAGGAGTCTTTGACAGTTGAGATCTGTGGGAAACCGCTGGATCTGAGCAACCTGTCCCT 

AGAAGGCATCGCAGTGCTAAACATCCCTAGCATGCATGGTGGCTCCAACCTCTGGGGTGA 

TACCAGGAGACCCCATGGGGATATCTATGGGATCAACCAGGCCTTAGGTGCTACAGCTAA 

AGT CAT C AC C GAC C C TGAT AT C CTGAAAAC C TGT GT AC C AG AC CT AAGTGAC AAG AG ACT 

GGAAGTGGTTGGGCTGGAGGGTGCAATTGAGATGGGCCAAATCTATACCAAGCTCAAGAA 

TGCTGGACGTCGGCTGGCCAAGTGCTCTGAGATCACCTTCCACACCACAAAAACCCTTCC 

CATGCAAATTGACGTAGAACCCTGGATGCAGACGCCCTGTACAATCAAGATCACCCACAA 

GAACCAGATGCCCATGCTCATGGGCCCACCCCCCCGCTCCACCAATTTCTTTGGCTTCTT 

GAGCTAAGGGGGACACCCTTGGCCTCCAAGCCAGC CTTGAACCCACCTCCCT GTCCCTGG 

ACTCTACTCCCGAGGCTC TGTACATTGCTGCCACATACTCCTGCCAGCTTGGGGGAGTGT 

TCCTTCACCCTCACAGTATTTATTATCCTGCACCACCTCACTGTTCCCCATGCGCACACA 

CATACACACACCCCAAAACACATACATTGAAAGTGCCTCATCTGAATAAAATGACTTGTG 

TTTCCCTTTGGGATCTGCTG 

In a search of public sequence databases, the NOV 15 nucleic acid sequence, located on 
chromsome 12 has 2038 of 2038 bases (100%) identical to a gb:GENBANK- 
ID:HSDKRNA|acc:X62535.1 mRNA from Homo sapiens (H.sapiens mRNA for 
diacylglycerol kinase). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV15 polypeptide (SEQ ID NO:40) encoded by SEQ ID NO:39 has 
727 amino acid residues and is presented in Table 1 5B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 15 has no signal peptide and is 
likely to be localized in the nucleus with a certainty of 0.3000. 



Table 15B. Encoded NOV15 protein sequence (SEQ ID NO:40). 

MAKERGLISPSDFAQLQKYMEYSTKKVSDVLKLFEDGEMAKYVQGDAIGYEGFQQFLKIY 
LEVDNVPRHLSLALFQSFETGHCLNETNVTKDWCLNDVSCYFSLLEGGRPEDKLEFTFK 
L YDTDRNG ILDSS MMRVAE YLDWD VS ELRP I LQEMMKE ID YDGS GS VS QAE WRAGATTV 
PLLVLLGLEMTLKDDGQHMWRPKRFPRPVYCNLCESSIGLGKQGLSCNLCKYTVHDQCAM 
KALPCEVSTYAKSRKDIGVQSHVWVRGGCESGRCDRCQKKIRIYHSLTGLHCVWCHLEIH 
DDCLQAVGHECDCGLLRDHILPPSSIYPSVLASGPDRKNSKTSQKTMDDLNLSTSEALRI 
DPVPNTHPLLVFVNPKSGGKQGQRVLWKFQYILNPRQVFNLLKDGPEIGLRLFKDVPDSR 
ILVCGGDGTVGWILETIDKANLPVLPPVAVLPLGTGNDI^CLRWGGGYEGQNIAKILKD 
LEMSKWHMDRWSVEVIPQQTEEKSDPVPFQIINNYFSIGVDASIAHRFHIMREKYPEKF 
NSRMKNKLWYFEFATSESIFSTCKKLEESLTVEICGKPLDLSNLSLEGIAVLNIPSMHGG 
SNLWGDTRRPHGD I YG INQ ALG AT AKVI TD PD I LKTCVPDLSDKRLE WGLEGAI EMGQ I 
YTKLKNAGRRLAKCSEITFHTTKTLPMQIDVEPWMQTPCTIKITHKNQMPMLMGPPPRST 
NFFGFLS 



A search of sequence databases reveals that the NOV 15 amino acid sequence has 727 

of 735 amino acid residues (98%) identical to, and 727 of 735 amino acid residues (98%) 

1 5 similar to, the 735 amino acid residue ptnr:SWISSNEW-ACC:P23743 protein from Homo 

sapiens (Human) (DIACYLGLYCEROL KINASE, ALPHA (EC 2.7.1.107) (DIGLYCERIDE 
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KINASE) (DGK- ALPHA) (DAG KINASE ALPHA) (80 KDA DIACYLGLYCEROL 
KINASE)). Public amino acid databases include the GenBank databases, SwissProt, PDB and 
PIR. 

NOV 15 is expressed in at least Aorta, Appendix, Ascending Colon, Bone, Bone 
5 Marrow, Brain, Bronchus, Cartilage, Cervix, Colon, Coronary Artery, Dermis, Heart, 
Hippocampus, Kidney, Left cerebellum, Liver, Lung, Lymph node, Lymphoid tissue, 
Mammary gland/Breast, Ovary, Pancreas, Parotid Salivary glands, Peripheral Blood, Pituitary 
Gland, Placenta, Prostate, Respiratory Bronchiole, Small Intestine, Spleen, Stomach, 
Substantia Nigra, Synovium/Synovial membrane, Temporal Lobe, Testis, Thymus, Tonsils, 

10 Trachea, Umbilical Vein, Uterus, Vein, Whole Organism. Expression information was derived 
from the tissue sources of the sequences that were included in the derivation of the sequence 
of CG57652-01. The sequence is predicted to be expressed in the following tissues because of 
the expression pattern of (GENBANK-ID: gb:GENBANK-ID:HSDKRNA|acc:X62535.1) a 
closely related H.sapiens mRNA for diacylglycerol kinase homolog in species Homo sapiens: 

15 lymphocytes, oligodendroglial cells, and neutrophils.. This information was derived by 

determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV 15 polypeptide has homology to the amino acid sequences shown 
20 in the BLASTP data listed in Table 1 5C. 



Table 15C. BLAST results for NOV15 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 11415024 | ref |NP 
001336, l| 
(NM_001345) 


diacylglycerol 
kinase, alpha 
(8 0kD) [Homo 
sapiens] 


735 


727/735 
(98%) 


727/735 
(98%) 


0.0 


gi|l25323|sp|P20l92 
|KDGA PIG 


Diacylglycerol 
kinase, alpha 
(Diglyceride 
kinase) (DGK- 
alpha) (DAG 
kinase alpha) {80 
kDa 

diacylglycerol 
kinase) 


734 


679/734 
(92%) 


703/734 
(95%) , 


0.0 


gi|l3879470|gb|AAH0 
6713.l|AAH06713 
(BC006713) 


diacylglycerol 
kinase, alpha (80 
kDa) [Mus 
musculus] 


730 


597/736 
(81%) ; 


647/736 
(87%) , 


0.0 


gi| 18158459|ref |NP 
542965. 1| 
(NM_080787) 


diacylglycerol 
kinase, alpha 
(80kD) [Rattus 
norvegicus] 


727 


600/736 
(81%) 


646/736 
(87%) , 


0.0 
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gi|7949033|ref |NP 0 


diacylglycerol 


730 


594/736 


645/736 


0.0 


58091. 1| 


kinase, alpha (80 




(80%) 


(86%) 




(NM_016811) 


kDa) [Mus 
musculus] 











Tables 1 5D-E list the domain descriptions from DOMAIN analysis results against 
NOV 15. This indicates that the NOV 15 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 15D. Domain Analysis of NOV15 

gnl 1 Pf am | pfamO 06 09 , DAGKa, Diacylglycerol kinase accessory domain 
(presumed) . Diacylglycerol (DAG) is a second messenger that acts as a 
protein kinase C activator 

CD-Length = 170 residues, 100.0% aligned 

Score = 275 bits (702) , Expect = 9e-75 



Table 15E. Domain Analysis of NOV15 

gnl | Smart [ smart 001 09 , CI, Protein kinase C conserved region 1 (CI) 
domains (Cysteine- rich domains) ; Some bind phorbol esters and 
diacylglycerol. Some bind RasGTP. Zinc -binding domains. 

CD-Length = 50 residues, 96.0% aligned 

Score = 53.5 bits (127), Expect = 4e-08 



Diacylglycerol (DAG) functions in intracellular signaling pathways as an allosteric 
activator of protein kinase C (PKC; see 600448). In addition, DAG appears to play a role in 
regulating RAS (see 190020) and RHO (see 165370) family proteins by activating the guanine 
nucleotide exchange factors VAV (164875) and RASGRP1 (603962). DAG also occupies a 
central position in the synthesis of major phospholipids and triacylglycerols. Thus, to maintain 
cellular homeostasis, intracellular DAG levels must be tightly regulated (Topham and Prescott, 
1999). DAG kinases (DGKs or DAGKs; EC 2.7.1.107 ) phosphorylate DAG to phosphatide 
acid, thus removing DAG. In intracellular signaling pathways, DAGK can be viewed as a 
modulator that competes with PKC for the second messenger DAG. Schaap et al. (1990) 
purified and characterized an 86-kD DAGK from normal human white blood cells. Based on 
partial amino acid sequences of the purified enzyme, primers were designed that permitted 
cloning of the human DAGK cDNA by use of PCR. The sequence demonstrated that it is the 
human homolog of the porcine gene. The human DAGK cDNA, transfected into COS-7 cells, 
resulted in a 6- to 7-fold increase in enzyme activity. 
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Several mammalian isozymes of DAGK have been identified. The isoform described 
by Schaap et al. (1990) has been designated DGK-alpha or DAGK1 . Topham and Prescott 
(1999) stated that all DGKs have a conserved catalytic domain and at least 2 cysteine-rich 
regions homologous to the CIA and C1B motifs of PKCs. Most DGKs have structural motifs 
5 that are likely to play regulatory roles, and these motifs form the basis for dividing the DGKs 
into 5 subtypes. Type I DGKs, such as DGK-alpha, -beta (604070), and -gamma (601854), 
have calcium-binding EF-hand motifs at their N termini. DGK-delta (601826) and DKG-eta 
(604071) contain N-terminal pleckstrin homology (PH) domains and are defined as type II. 
DGK-epsilon (601440) contains no identifiable regulatory domains and is a type III DGK. The 

10 defining characteristic of type IV isozymes, such as DGK-zeta (601441) and -iota (604072), is 
that they have C-terminal ankyrin repeats. Group V is exemplified by DGK-theta (601207), 
which contains 3 cysteine-rich domains and a PH domain. 

Pilz et al. (1995) pointed to the growing evidence to support some form of light- 
activated phosphoinositide signal transduction pathway in the mammalian retina. Although 

1 5 this pathway had no obvious role in mammalian phototransduction, mutations in this pathway 
were known to cause retinal degeneration in Drosophila. For example, the 'retinal degeneration 
A' mutant in Drosophila is caused by an alteration in the eye-specific DAGK gene. In an effort 
to consider genes mutated in Drosophila as candidates for mammalian eye disease, Pilz et al. 
(1995) determined the map position of 3 DAGK genes in the mouse. They localized the mouse 

20 homolog of DAGK 1 to chromosome 10 by linkage analysis. By Southern blot analysis of 
human-hamster somatic cell hybrid DNA, Hart et al. (1994) assigned the DAGK gene to 
chromosome 12. Hart et al. (1994) further localized the gene to 12ql3.3 by fluorescence in 
situ hybridization. 

The disclosed NOV 15 nucleic acid of the invention encoding a diacylglycerol kinase 
25 alpha-like protein includes the nucleic acid whose sequence is provided in Table 1 5 A or a 
fragment thereof The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 15A while still encoding a 
protein that maintains its diacylglycerol kinase alpha-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
30 whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
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derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 0 percent of the bases may be so changed. 
5 The disclosed NOV1 5 protein of the invention includes the diacylglycerol kinase 

alpha-like protein whose sequence is provided in Table 15B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 1 5B while still encoding a protein that maintains its diacylglycerol 
kinase alpha-like activities and physiological functions, or a functional fragment thereof. In 

10 the mutant or variant protein, up to about 2 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this diacylglycerol 
kinase alpha-like protein (NOV 15) may function as a member of a "diacylglycerol kinase 

15 alpha family". Therefore, the NOV 15 nucleic acids and proteins identified here may be useful 
in potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 
(therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

20 marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV 15 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the diacylglycerol kinase 

25 alpha-like protein (NOV 1 5) may be useful in gene therapy, and the diacylglycerol kinase 

alpha-like protein (NOV 15) may be useful when administered to a subject in need thereof. By 
way of nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from osteoporosis, hypercalceimia, arthritis, ankylosing 
spondylitis, scoliosis, hemophilia, hypercoagulation, idiopathic thrombocytopenic purpura, 

30 autoimmune disease, allergies, asthma, immunodeficiencies, transplantation, graft versus host 
disease, Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch- 
Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 
disorders, addiction, anxiety, pain, neurodegeneration, tendonitis, fertility, atherosclerosis, 
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aneurysm, hypertension, fibromuscular dysplasia, scleroderma, myocardial infarction, 
embolism, cardiovascular disorders, bypass surgery, cardiomyopathy, atherosclerosis, 
congenital heart defects, aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) 
canal defect, ductus arteriosus, pulmonary stenosis, subaortic stenosis, ventricular septal defect 
5 (VSD), valve diseases, renal artery stenosis, interstitial nephritis, glomerulonephritis, 

polycystic kidney disease, renal tubular acidosis, IgA nephropathy, cirrhosis, systemic lupus 
erythematosus, emphysema, ARDS, lymphedema , endometriosis, diabetes, pancreatitis, 
obesity, anemia, ataxia-telangiectasia, endocrine dysfunctions, growth and reproductive 
disorders, inflammatory bowel disease, diverticular disease, ulcers, tonsillitis, ARDS, anemia , 

10 bleeding disorders, or other pathologies or conditions. The NOV1 5 nucleic acid encoding the 
diacylglycerol kinase alpha-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV 15 nucleic acids and polypeptides are further useful in the generation of 

15 antibodies that bind immuno-specifically to the novel NOV 15 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 15 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 

20 assay systems for functional analysis of various human disorders, which will help in 

understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV16 

A disclosed NOV1 6 nucleic acid of 3904 nucleotides (also referred to as CG57562-01) 
25 encoding a cation-transporting ATPase-like protein is shown in Table 1 6A. Putative 
untranslated regions upstream and/or downstream from the coding region, if any, are 
underlined, and the start and stop codons are in bold letters. 



Table 16A. NOV16 nucleotide sequence (SEQ ID NO:41). 



TTACCGGAAGTAAAA CTTCGGA AGTGAGGCGTTCCTCTGCCCGGAAGTGAGC GCGGCGCT 
AGGAAAGA TGGCGGCAGCGGCGGCGGTGGGCAACGCGGTGCCCTGCGGGGCCCGGCCTTG 
CGGGGTCCGGCCTGACGGGCAGCCCAAGCCCGGGCGCAGCCGGCGCGCGCTCCTTGCCGC 
CGGGCCGGCGCTCATAGCGAACGGTGACGAGCTGGTGGCTGCCGTGTGGCCGTACCGGCG 
GTTGGCGCTGTTGCGGCGCCTCACGGTGCTGCCATTCGCCGGGCTGCTTTACCCGGCCTG 
GTTGGGTGCCGCAGCCGCTGGCTGCTGGGGCTGGGGCAGCAGTTGGGTGCAGATCCCCGA 
AGCTGCGCTGCTCGTGCTTGCCACCATCTGCCTCGCGCACGCGCTCACTGTCCTCTCGGG 
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GCATTGGTCTGTGCACGCGCATTGCGCGCTCACCTGCACCCCGGAGTACGACCCCAGCAA 
AGCGACCTTTGTGAAGGTGGTGCCAACCCCCAACAATGGCTCCACGGAGCTCGTGGCCCT 
GCACCGCAATGAGGGCGAAGACGGGCTTGAGGTGCTGTCCTTCGAATTCCAGAAGATCAA 
GTATTCCTACGATGCCCTGGAGAAGAAGCAGTTTCTCCCCGTGGCCTTTCCTGTGGGAAA 
CGCCTTCTCATACTATCAGAGCAACAGAGGCTTCCAGGAAGACTCAGAGATCCGAGCAGC 
TGAGAAGAAATTTGGGAGCAACAAGGCCGAGATGGTGGTGCCTGACTTCTCGGAGCTTTT 
CAAGGAGAGAGCCACAGCCCCCTTCTTTGTATTTCAGGTGTTCTGTGTGGGGCTCTGGTG 
CCTGGATGAGTACTGGTACTACAGCGTCTTTACGCTATCCATGCTGGTGGCGTTCGAGGC 
CTCGCTGGTGCAGCAGCAGATGCGGAACATGTCGGAGATCCGGAAGATGGGCAACAAGCC 
CCACATGATCCAGGTCTACCGAAGCCGCAAGTGGAGGCCCATTGCCAGTGATGAGATCGT 
ACCAGGGGACATCGTCTCCATCGGCCGCTCCCCACAGGAGAACCTGGTGCCATGTGACGT 
GCTTCTGCTGCGAGGCCGCTGCATCGTAGACGAGGCCATGCTCACGGGGGAGTCCGTGCC 
ACAGATGAAGGAGCCCATCGAAGACCTCAGCCCAGACCGGGTGCTGGACCTCCAGGCTGA 
TTCCCGGCTGCACGTCATCTTCGGGGGCACCAAGGTGGTGCAGCACATCCCCCCACAGAA 
AGCCACCACGGGCCTGAAGCCGGTTGACAGCGGGTGCGTGGCCTACGTCCTGCGGACCGG 
ATTCAACACATCCCAGGGCAAGCTGCTGCGCACCATCCTCTTCGGGGTCAAGAGGGTGAC 
TGCGAACAACCTGGAGACCTTCATCTTCATCCTCTTCCTCCTGGTGTTTGCCATCGCTGC 
AGCTGCCTATGTATGGATTGAAGGTACCAAGGACCCCAGCCGGAACCGCTACAAGCTGTT 
TCTGGAGTGCACCCTGATCCTCACCTCGGTCGTGCCTCCTGAGCTGCCCATCGAGCTGTG 
CCTGGCCGTCAACACCTCCCTCATCGCCCTGGCCAAGCTCTACATGTACTGCACAGAGCC 
CTTCCGGATCCCCTTTGCTGGCAAGGTCGAGGTGTGCTGCTTTGACAAGACGGGGACGTT 
GACCAGTGACAGCCTGGTGGTGCGCGGTGTGGCCGGGCTGAGAGACGGGAAGGAGGTGAC 
CCCAGTGTCCAGCATCCCTGTAGAAACACACCGGGCCCTGGCCTCGTGCCACTCGCTCAT 
GCAGCTGGACGACGGCACCCTCGTGGGTGACCCTCTAGAGAAGGCCATGCTGACGGCCGT 
GGACTGGACGCTGACCAAAGATGAGAAAGTATTCCCCCGAAGTATTAAAACTCAGGGGCT 
GAAAATTCACCAGCGCTTTCATTTTGCCAGTGCCCTGAAGCGAATGTCCGTGCTTGCCTC 
GTATGAGAAGCTGGGCTCCACCGACCTCTGCTACATCGCGGCCGTGAAGGGGGCCCCCGA 
AACTCTGCACTCCATGTTCTCCCAGTGCCCGCCCGACTACCACCACATCCACACCGAGAT 
CTCCCGGGAAGGAGCCCGCGTCCTGGCGCTGGGGTACAAGGAGCTGGGACACCTCACTCA 
CCAGCAGGCCCGGGAGGTCAAGCGGGAGGCCCTGGAGTGCAGCCTCAAGTTCGTCGGCTT 
CATTGTGGTCTCCTGCCCGCTCAAGGCTGACTCCAAGGCCGTGATCCGGGAGATCCAGAA 
TGCGTCCCACCGGGTGGTCATGATCACGGGAGACAACCCGCTCACTGCATGCCACGTGGC 
CCAGGAGCTGCACTTCATTGAAAAGGCCCACACGCTGATCCTGCAGCCTCCCTCCGAGAA 
AGGCCGGCAGTGCGAGTGGCGCTCCATTGACGGCAGCATCGTGCTGCCCCTGGCCCGGGG 
CTCCCCAAAGGCACTGGCCCTGGAGTACGCACTGTGCCTCACAGGCGACGGCTTGGCCCA 
CCTGCAGGCCACCGACCCCCAGCAGCTGCTCCGCCTCATCCCCCATGTGCAGGTGTTCGC 
CCGTGTGGCTCCCAAGCAGAAGGAGTTTGTCATCACCAGCCTGAAGGAGCTGGGCTACGT 
GAC C CT C ATGTGTGGGGATGGC AC C AACGACGTGGGCGC C CTGAAGC ATGCTG ACGTGGG 
TGTGGCGCTCTTGGCCAATGCCCCTGAGCGGGTTGTCGAGCGGCGACGGCGGCCCCGGGA 
CAGCCCAACCCTGAGCAACAGTGGCATCAGAGCCACCTCCAGGACAGCCAAGCAGCGGTC 
GGGGCTCCCTCCCTCCGAGGAGCAGCCAACCTCCCAGAGGGACCGCCTGAGCCAGGTGCT 
GCGAGACCTCGAGGACGAGAGTACGCCCATTGTGAAACTGGGGGATGCCAGCATCGCAGC 
ACCCTTCACCTCCAAGCTCTCATCCATCCAGTGCATCTGCCACGTGATCAAGCAGGGCCG 
CTGCACGCTGGTGACCACGCTACAGATGTTCAAGATCCTGGCGCTCAATGCCCTCATCCT 
GGCCTACAGCCAGAGCGTCCTCTACCTGGAGGGAGTCAAGTTCAGTGACTTCCAGGCCAC 
CCTACAGGGGCTGCTGCTGGCCGGCTGCTTCCTCTTCATCTCCCGTTCCAAGCCCCTCAA 
GACCCTCTCCCGAGAACGGCCCCTGCCCAACATCTTCAACCTGTACACCATCCTCACCGT 
CATGCTCCAGTTCTTTGTGCACTTCCTGAGCCTTGTCTACCTGTACCGTGAGGCCCAGGC 
CCGGAGCCCCGAGAAGCAGGAGCAGTTCGTGGACTTGTACAAGGAGTTTGAGCCAAGCCT 
GGTCAACAGCACCGTCTACATCATGGCCATGGCCATGCAGATGGCCACCTTCGCCATCAA 
TTACAAAGGCCCGCCCTTCATGGAGAGCCTGCCCGAGAACAAGCCCCTGGTGTGGAGTCT 
GGCAGTTTCACTCCTGGCCATCATTGGCCTGCTCCTCGGCTCCTCGCCCGACTTCAACAG 
CCAGTTTGGCCTCGTGGACATCCCTGTGGAGTTCAAGCTGGTCATTGCCCAGGTCCTGCT 
CCTGGACTTCTGCCTGGCGCTCCTGGCCGACCGCGTCCTGCAGTTCTTCCTGGGGACCCC 
GAAGCTGAAAGTGCCTTCCTG AGATGGCAGTGCTGGTACCCACTGCCCACCCTGGCTGCC 
GCTGGGCGGGAACCCCAACAGGG CCCCGGG AGGGAACCCTGCCCCCAACCCCCCACAGCA 
AGGCTGTACAGTCTCGCCCTTGGAAGACTGAGCTGGGACCCCCACAG CCATCCGCTGGCT 
TGGCCAGCAGAACCAGCCCCAAGCCAGCACCTTTGGTAAATAAAGCAGCATCTGAGATTT 
TAAA 
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In a search of public sequence databases, the NOV 16 nucleic acid sequence, located on 
chromsome 19 has 3442 of 3442 bases (100%) identical to a gb:GENBANK- 
ID:AF288687|acc:AF288687.1 mRNA from Homo sapiens (Homo sapiens CGI-152 protein 
mRNA, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV 16 polypeptide (SEQ ID NO:42) encoded by SEQ ID NO:41 has 
1204 amino acid residues and is presented in Table 16B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 16 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.8000. 



Table 16B. Encoded NOV16 protein sequence (SEQ ID NO:42). 

MAAAAAVGNAVPCGARPCGWPDGQPKPGRSRRALLAAGPALIANGDELVAAVWPYRRLA 
LLRRLTVLP FAGLL YP AWLGAAAAGCWGWGS S WVQ I PE AALLVLAT I CLAHALTVLSGHW 
SVHAHCALTCTPEYDPSKATFVKWPTPNNGSTELVALHRNEGEDGLEVLSFEFQKIKYS 
YDALEKKQFLPVAFPVGNAFSYYQSNRGFQEDSEIRAAEKKFGSNKAEMWPDFSELFKE 
RATAPFFVFQVFCVGLWCLDEYWYYSVFTLSMLVAFEASLVQQQMRNMSEIRKMGNKP^ 
IQVYRSRKWRP I ASDE I VPGD I VS IGRS PQENLVPCDVLLLRGRC I VDEAMLTGES VPQM 
KEPIEDLSPDRVLDLQADSRLHVIFGGTKWQHIPPQKATTGLKPVDSGCVAYVLRTGFN 
TSQGKLLRTILFGVKRVTANNLETFIFILFLLVFAIAAAAYVWIEGTKDPSRNRYKLFLE 
CTL I LTS WPPELP I ELS LAVNTS L I ALAKLYMYCTE PFR I PFAGKVE VCCFDKTGTLTS 
DSLWRGVAGLRDGKEVTPVS S I PVETHRALASCHSLMQLDDGTLVGDPLEKAMLTAVDW 
TLTKDEKVFPRSIKTQGLKIHQRFHFASALKRMSVLASYEKLGSTDLCYIAAVKGAPETL 
HSMFSQCPPDYHHIHTEISREGARVLALGYKELGHLTHQQAREVKREALECSLKFVGFIV 
VSCPLKADSKAVIREIQNASHRWMITGDNPLTACHVAQELHFIEKAHTLILQPPSEKGR 
Q CE WR S I DGS I VL PLARG S PKALALE YALCLTGDGL AHLQATDPQQLLRL I PHVQ VFAR V 
APKQKEFVITSLKELGYVTLMCGDGTNDVGALKHADVGVALLANAPERVVERRRRPRDSP 
TLSNSGIRATSRTAKQRSGLPPSEEQPTSQRDRLSQVLRDLEDESTPIVKLGDASIAAPF 
TSKLSSIQCICHVIKQGRCTLVTTLQMFKILALNALILAYSQSVLYLEGVKFSDFQATLQ 
GLLLAGCFLFISRSKPLKTLSRERPLPNIFNLYTILTVMLQFFVHFLSLVYLYREAQARS 
PEKQEQFVDLYKEFEPSLVNSTVYIMAMAMQMATFAINYKGPPFMESLPENKPLVWSLAV 
SLLAIIGLLLGSSPDFNSQFGLVDIPVEFKLVIAQVLLLDFCLALLADRVLQFFLGTPKL 
KVPS 



A search of sequence databases reveals that the NOV 16 amino acid sequence has 1141 
of 1200 amino acid residues (95%) identical to, and 1 164 of 1200 amino acid residues (97%) 
similar to, the 1200 amino acid residue ptnr:TREMBLNE W-ACC :BAB20095 protein from 
Mus musculus (Mouse) (CATION-TRANSPORTING ATPASE). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV 16 is expressed in at least Adrenal Gland/Suprarenal gland, Amygdala, Aorta, 
Appendix, Artery, Bone, Bone Marrow, Brain, Bronchus, Brown adipose, Cartilage, Cerebral 
Medulla/Cerebral white matter, Cervix, Colon, Coronary Artery, Epidermis, Hair Follicles, 
Heart, Hippocampus, Kidney, Left cerebellum, Liver, Lung, Lymph node, Lymphoid tissue, 
Mammary gland/Breast, Ovary, Oviduct/Uterine Tube/Fallopian tube, Pancreas, Parietal Lobe, 
Peripheral Blood, Pituitary Gland, Placenta, Prostate, Respiratory Bronchiole, Right 
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Cerebellum, Skeletal Muscle, Skin, Spinal Cord, Spleen, Stomach, Substantia Nigra, 
Synovium/Synovial membrane, Temporal Lobe, Testis, Thymus, Urinary Bladder, Uterus, 
Vein, and Vulva. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV1 6 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 16C. 



Table 16C. BLAST results for NOV16 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
<%) 


Po 
sitives 
(%) 


Expect 


gi| 14017867 | db j | BAB 
47454 .1| (AB058728) 


KIAA1825 protein 
[Homo sapiens] 


1203 


1200/1203 
(99%) 


1200/1203 
(99%) 


0.0 


gi | 18 202961 | sp|Q9HD 
20|ATY2 HUMAN 


Probable cation- 
transporting 
ATPase 2 (CGI- 
152) 


1204 


1201/1204 
(99%) 


1201/1204 
(99%) 


0.0 


gi|l8202867|sp|Q9EP 
E9|ATY2 MOUSE 


Probable cation- 
transporting 
ATPase 2 (CATP) 


1200 


1141/1200 
(95%) 


1164/1200 
(96%) , 


0.0 


gi|9966897|ref |NP 0 
65143. l| (NM 020410 


CGI-152 protein 
[Homo sapiens] 


1086 


1072/1072 
(100%) 


1072/1072 
(100%) 


0.0 


gi|l8467838|ref |XP 
079402. l| 
<XM 079402) 


BcDNA:GH06032 
[Drosophila 
melanogaster] 


1225 


611/1125 
(54%) 


803/1125 
(71%) , 


0.0 



Tables 16D-E list the domain descriptions from DOMAIN analysis results against 
NOV 16. This indicates that the NOV 16 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 16D. Domain Analysis of NOV16 

gnl [ P£am[pf am00122 / El -E2_ATPase, E1-E2 ATPase 

CD-Length = 223 residues, 57.4% aligned 

Score = 64.3 bits (155), Expect - 4e-ll 
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Table 16E. Domain Analysis of NOV16 

gnl 1 Pf am | pf amOQ702 , Hydrolase, haloacid dehalogenase-like hydrolase. 
This family are structurally different from the alpha/ beta hydrolase 
family (pfam00561) . This family includes L- 2 -haloacid dehalogenase, 
epoxide hydrolases and phosphatases. The structure of the family 
consists of two domains. One is an inserted four helix bundle, which 
is the least well conserved region of the alignment, between residues 
16 and 96. The rest of the fold is composed of the core alpha/beta 
domain. 

CD-Length = 197 residues, 78.2% aligned 

Score = 39.7 bits (91), Expect = 0.001 

Regulation of cation homeostasis is of critical importance to the cell and defects in 
proteins that regulate this process have been shown to cause a number of human diseases, 
including Darier- White disease, Menkes disease and Wilson disease. The human plasma 
5 membrane Ca(2+)-ATPase (PMCA) isoforms are members of the P class of ion-motive 

ATPases. PMCA removes bivalent calcium ions from eukaryotic cells and plays a critical role 
in intracellular calcium homeostasis by its capacity for removing calcium ions from cells 
against very large concentration gradients. Together with the highly related SERCA1 and 
SERCA3 isoforms encoded by ATP2A1 and ATP2A3, respectively, SERCA2 belongs to the 

10 large family of P-type cation pumps that couple ATP hydrolysis with cation transport across 
membranes. SERCA pumps specifically maintain low cytosolic Ca(2+) concentrations by 
actively transporting Ca(2+) from the cytosol into the sarco/endoplasmic reticulum lumen. The 
ATP2A2 gene has been shown to be the site of mutations in Darier- White disease, an 
autosomal dominant skin disorder characterized by warty papules and plaques in seborrheic 

1 5 areas (central trunk, flexures, scalp, and forehead), palmoplanar pits, and distinctive nail 

abnormalities (See Sakuntabhai et al., (1999) Mutations in ATP2A2, encoding a Ca(2+) pump, 
cause Darier disease. Nature Genet. 21 : 271-277). Patients with Menkes disease have 
mutations in the gene encoding Cu(2+)-transporting ATPase alpha polypeptide and display 
early retardation in growth, peculiar hair, and focal cerebral and cerebellar degeneration ((See 

20 Chelly et al., (1993) Isolation of a candidate gene for Menkes disease that encodes a potential 
heavy metal binding protein. Nature Genet. 3: 14-19). Wilson disease is an autosomal 
recessive disorder caused by mutations in the ATP7B gene, which encodes a copper- 
transporting ATPase, and is characterized by dramatic build-up of intracellular hepatic copper 
with subsequent hepatic and neurologic abnormalities (See Bull et al.. (1993) The Wilson 

25 disease gene is a putative copper transporting P-type ATPase similar to the Menkes gene. 
Nature Genet. 5: 327-337). 
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The human cation transporting ATPase-like protein described in this invention is 
predicted to share the attributes of the other cation transporting ATPase family members and is 
thus implicated in the regulation of cation homeostasis. Given that a large number of cation 
transporting ATPases have been demonstrated to have a causative role in a variety of human 
5 diseases, the cation transporting ATPase-like protein is an attractive target for drug 

intervention in the treatment of human metabolic diseases, central nervous system disorders, 
immunological diseases and cancer, among others. The cation transporting ATPase-like gene 
described in this patent maps to human chromosome 19. 

The disclosed NOV 16 nucleic acid of the invention encoding a cation-transporting 

10 ATPase-like protein includes the nucleic acid whose sequence is provided in Table 16A or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 1 6A while still encoding a 
protein that maintains its cation-transporting ATPase-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 

15 whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

20 derivatized. These modifications are carried out at least in part to enhance the chemical 

stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 0 percent of the bases may be so changed. 

The disclosed NOV 16 protein of the invention includes the cation-transporting 

25 ATPase-like protein whose sequence is provided in Table 16B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 16B while still encoding a protein that maintains its cation- 
transporting ATPase-like activities and physiological functions, or a functional fragment 
thereof. In the mutant or variant protein, up to about 5 percent of the residues may be so 

30 changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this cation-transporting 
ATPase-like protein (NOV 16) may function as a member of a "cation-transporting ATPase 
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family". Therefore, the NOV 16 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 
5 (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV 16 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

10 and disorders as indicated below. For example, a cDNA encoding the cation-transporting 
ATPase-like protein (NOV 16) may be useful in gene therapy, and the cation-transporting 
ATPase-like protein (NOV 16) may be useful when administered to a subject in need thereof. 
By way of nonlimiting example, the compositions of the present invention will have efficacy 
for treatment of patients suffering from cancer, trauma, bacterial and viral infections, in vitro 

1 5 and in vivo regeneration, cardiomyopathy, atherosclerosis, hypertension, congenital heart 

defects, aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus 
arteriosus, pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), valve 
diseases, tuberous sclerosis, scleroderma, obesity, aneurysm, hypertension, fibromuscular 
dysplasia, stroke, obesity, transplantation, myocardial infarction, embolism, cardiovascular 

20 disorders, bypass surgery, anemia, bleeding disorders, adrenoleukodystrophy , congenital 
adrenal hyperplasia, diabetes, Von Hippel-Lindau (VHL) syndrome, pancreatitis, fertility, 
endometriosis, hypogonadism, hypercalceimia, ulcers, cirrhosis, Hirschsprung's disease , 
Crohn's Disease, appendicitis, hemophilia, hypercoagulation, idiopathic thrombocytopenic 
purpura, ataxia-telangiectasia, lymphedema, allergies, hemophilia, autoimmune disease, 

25 allergies, immunodeficiencies, transplantation, graft versus host disease (GVHD), 

osteoporosis, arthritis, ankylosing spondylitis, scoliosis, arthritis, tendonitis, muscular 
dystrophy, Lesch-Nyhan syndrome, myasthenia gravis, Alzheimer's disease, Parkinson's 
disease, Huntington's disease, cerebral palsy, epilepsy, multiple sclerosis, leukodystrophies, 
behavioral disorders, addiction, anxiety, pain, neurodegeneration, endocrine dysfunctions, 

30 growth and reproductive disorders, systemic lupus erythematosus, asthma, emphysema, 

ARDS, psoriasis, actinic keratosis, acne, hair growth/loss, allopecia, pigmentation disorders,, 
cystitis, incontinence, renal artery stenosis, interstitial nephritis, glomerulonephritis, polycystic 
kidney disease, renal tubular acidosis, and IgA nephropathy, or other pathologies or 
conditions. The NOV 16 nucleic acid encoding the cation-transporting ATPase-like protein of 
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the invention, or fragments thereof, may further be useful in diagnostic applications, wherein 
the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV 16 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 16 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 16 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV17 

A disclosed NOV 17 nucleic acid of 1 167 nucleotides (also referred to as CG559 14-01) 
encoding an acyl CoA desaturase-like protein is shown in Table 1 7A. Putative untranslated 
regions upstream and/or downstream from the coding region, if any, are underlined, and the 
start and stop codons are in bold letters. 



Table 17A. NOV17 nucleotide sequence (SEQ ID NO:43). 



ATGGGGCAGTTTTATAGGACTGGGGCAGGCAGTGGAAAGTTACAGTTAAAGGTGGTTATC 

TATTGTCAGCTGAGGAGGGATCACAAGGTGAATGGTGAGGAGATCATAAGACTCATTGTC 

CAAAAGAGGAATGTCACGAGGTCAATCGATCGATCAGTTGGGGCAGGGCAGGAACAAGTC 

ATAATGGAATGTCTTAAGCCTTTTTACTTCAACTATCCATCTCTAGACAGTGAGGTCCTG 

GATGATGACAGAGCCATAGATGGAAAAGACACCATTATTCTGGTCTATAAAGAACTGTCT 

AGGGACTTGGCATCCTGTGTCCCAGCCACTCCAGCTGTGGCTGAAAGGGTCCAAGGTACA 

GTTCAGGCCATGGCTTCAAAGGGTGCAAGCCCCAAGTCTCGGCAGCTTTCACAAGGTGTT 

AAGCCTGGCAGTACAGAAGGGAAATGTGGGTTdGAGCCCCAACACAGAGCCCCACTGGGG 

AC AC TG C CTAGTGG AG CTTTG AGAAG AGGGC CAC C ATT CTCC AGAC C C CAGAATGGT AG A 

CCCACTGACAGCTTGCACTATGCACTTGGAAAAGACACAGACACTCAACACCAGCCCATG 

AAAGCAGCCAGAAGGGAGGCTGTACCCTGCACAGCTACAGGGGCAGAGCTGCCCAAGACC 

ATGGGAACCCAACTGTTGCATCAGCATGACCCAGATGTGAGAATTGGAGTCAAAGAAGAT 

CATTTTGGAGCTTTAAGATTTGACTGTCCTTCTAGATTTTGGACTTACATGAGGACCCCA 

GCCTTGCTGCTCTGCCCATTGACCTCTGCCACCCTCCATACCGGGTGTGAGTTACCACCA 

GAAGAAGTCTGTGGCAGCAGCAGTCTTTGCCTTATTCTGACAGTGCCAAAGTGTGTTTGG 

CCCTACCAACAACTGCAAGCCCTTTCCTTGCTTTACTTTAGCTTTGGCTCCAGAAGGCTC 

ATTGCTTTGAAATGTAACATATCCTGGTCCAACTACATCCGATTCCATGGGTCTGCATCA' 

TTATCTCCAAAACCTTCCAGTTGCATTGCTGTCATGTTTGTGTTAAAGAGACTGATGAGT 

CTGGACTATGACCGGAAGAAAGTCTCCACGGCCGCCATCTTGGCCAGGATTAAAAGAACT 

GGAGATGGAAGCTACAAGAGTGGCTGA 



In a search of public sequence databases, the NOV 17 nucleic acid sequence, located on 

20 chromsome 5 has 3 16 of 38 1 bases (82%) identical to a gb:GENBANK- 

ID:AK000899|acc:AK000899.1 mRNA from Homo sapiens (Homo sapiens cDNA FLJ10037 
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fis, clone HEMBA1 000968). Public nucleotide databases include all GenBank databases and 
the GeneSeq patent database. 

The disclosed NOV 17 polypeptide (SEQ ID NO:44) encoded by SEQ ID NO:43 has 
388 amino acid residues and is presented in Table B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 17 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.6500. 



Table 17B. Encoded NOV17 protein sequence (SEQ ID NO:44). 



MGQFYRTGAGSGKLQLKWIYCQLRRDHKVNGEEIIRLIVQKRNVTRSIDRSVGAGQEQV 
IMECLKPFYFNYPSLDSEVLDDDRAIDGKDTIILVYKELSRDLASCVPATPAVAERVQGT 
VQAMASKGASPKSRQLSQGVKPGSTEGKCGLEPQHRAPLGTLPSGALRRGPPFSRPQNGR 
PTDSLHYALGKDTDTQHQPMKAARREAVPCTATGAELPKTMGTQLLHQHDPDVRIGVKED 
HFGALRFDCPSRFWTYMRTPALLLCPLTSATLHTGCELPPEEVCGSSSLCLILTVPKCVW 
PYQQLQALS LLYFS FGSRRL I ALKCNI S VJSNY IRFHGS ASLS PKPS S C I AVMFVLKRLMS 
LDYDRKKVSTAAI LAR I KRTGDGS YKSG 



A search of sequence databases reveals that the NOV 17 amino acid sequence has 25 of 
30 amino acid residues (83%) identical to, and 27 of 30 amino acid residues (90%) similar to, 
the 359 amino acid residue ptnr:SWISSNEW-ACC:O00767 protein from Homo sapiens 
(Human) (ACYL-COA DESATURASE (EC 1.14.99.5) (STEAROYL-COA DESATURASE) 
(FATTY ACID DESATURASE) (DELTA(9)-DESATURASE)). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV 17 is expressed in at least Adipose, Adrenal Gland/Suprarenal gland, Aorta, 
Appendix, Artery, Bone, Bone Marrow, Brain, Bronchus, Buccal mucosa, Cartilage, Cerebral 
Medulla/Cerebral white matter, Cervix, Cochlea, Colon, Cornea, Coronary Artery, Dermis, 
Epidermis, Foreskin, Frontal Lobe, Gall Bladder, Hair Follicles, Heart, Hippocampus, 
Hypothalamus, Kidney, Larynx, Left cerebellum, Liver, Lung, Lung Pleura, Lymph node, 
Lymphoid tissue, Mammary gland/Breast, Myometrium, Ovary, Pancreas, Parathyroid Gland, 
Parietal Lobe, Parotid Salivary glands, Peripheral Blood, Pharynx, Pituitary Gland, Placenta, 
Prostate, RespiratoryBronchiole, Retina, Right Cerebellum, Salivary Glands, Skeletal Muscle, 
Skin, Small Intestine, Spinal Chord, Spleen, Stomach, Substantia Nigra, Synovium/Synovial 
membrane, Temporal Lobe, Testis, Thalamus, Thymus, Thyroid, Trachea, Umbilical Vein, 
Urinary Bladder, Uterus, Vein, Vulva. This information was derived by determining the tissue 
sources of the sequences that were included in the invention including but not limited to 
SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV 17 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 17C. 
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Table 17C. BLAST results for NOV17 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l4756057|ref ] XP 


hypothetical 

r~s y~c\ t" <=» "i rt T7T..T1 /I D ^ ft 

[Homo sapiens] 


222 


47/74 

V D O ^ } 


53/74 

\ 1 -L o / 


2e-18 


U J<i J J . X | 


gi(l8572532|ref | XP 


KIAA1161 protein 
[Homo sapiens] 


188 


59/145 
(40%) 


72/145 
(48%) , 


7e-17 


088501 .l| 
(XM_088501) 


gr [ 14722528 j rer ( XP 


similar to 
hypothetical 

nrnt pin PTnTTAHRft 

KJ ±* U C -Lll i- XJ X T U J U 

[Homo sapiens] 


129 


45/94(47% 

\ 
) 


56/94 
(58%) 


2e-15 


033352. 1| 
( XM 033 3 52 


gi|l6588389|gb|AAL2 


B lymphocyte 
activation- 
related protein 
BC-1514 [Homo 
sapiens] 


130 


48/126 
(38%) 


56/126 
(44%) , 


3e-10 


6787.l|AF304442 1 


(AF304442) 


gi|l8562020|ref | XP 


hypothetical 
protein XP_087778 
[Homo sapiens] 


216 


36/68 
(52%) 


39/68 
(56%) , 


2e-8 


087778 . 1 | 
(XM 087778) 



Fatty acid desaturases (ec 1.14.99.-) are enzymes that catalyze the insertion of a double 
bond at the delta position of fatty acids. There are two distinct families of fatty acid 
desaturases which do not seem to be evolutionary related. 

5 Family 1 is composed of: 

- Stearoyl-coa desaturase (scd) (ec 1.14.99.5). scd is a key regulatory enzyme of 
unsaturated fatty acid biosynthesis, scd introduces a cis double bond at the delta(9) position of 
fatty acyl-coa's such as palmitoleoyl- and oleoyl-coa. scd is a membrane-bound enzyme that is 
thought to function as a part of a multienzyme complex in the endoplasmic reticulum of 

1 0 vertebrates and fungi. 

Family 2 is composed of: 

- Plants stearoyl-acyl-carrier-protein desaturase (ec 1.14.99.6), these enzymes catalyze 
the introduction of a double bond at the delta(9) position of steraoyl-acp to produce oleoyl- 
acp. this enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty 

1 5 acids in the synthesis of vegetable oils. - cyanobacteria desa an enzyme that can introduce a 
second cis double bond at the delta(12) position of fatty acid bound to membranes 
glycerolipids. desa is involved in chilling tolerance; the phase transition temperature of lipids 
of cellular membranes being dependent on the degree of unsaturation of fatty acids of the 
membrane lipids. 
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The disclosed NOV 17 nucleic acid of the invention encoding an acyl CoA desaturase- 
like protein includes the nucleic acid whose sequence is provided in Table 17A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 1 7A while still encoding a protein 
5 that maintains its acyl CoA desaturase-like activities and physiological functions, or a 

fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

10 chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 

1 5 acids, and their complements, up to about 1 8 percent of the bases may be so changed. 

The disclosed NOV 17 protein of the invention includes the acyl CoA desaturase-like 
protein whose sequence is provided in Table 17B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 1 7B while still encoding a protein that maintains its acyl CoA desaturase-like 

20 activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 17 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this acyl CoA 

25 desaturase-like protein (NOV 17) may function as a member of a "acyl CoA desaturase 

family". Therefore, the NOV17 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 

30 (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV 17 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 




« ^6 ik? mt- 

HI ^*1*m31 3xsMi ' B— ■ tuuJH ti.^fi *f\; t ,. 



124 



vs its, ins -sou ffi-^ «i jrs-r^i 231 «sr?- <n?<T??H 

>«3U 4*2> 3^JJ gf^^L> -JP m Mi-SP il™ -'n„.» ?C. 



and disorders as indicated below. For example, a cDNA encoding the acyl CoA desaturase- 
like protein (NOV 17) may be useful in gene therapy, and the acyl CoA desaturase-like protein 
(NOV 17) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
5 treatment of patients suffering from Cardiomyopathy, Atherosclerosis, Hypertension, 

Congenital heart defects, Aortic stenosis, Atrial septal defect (ASD), Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis, Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases, Tuberous sclerosis, Scleroderma, Obesity, Transplantation, Von 
Hippel-Lindau (VHL) syndrome, Cirrhosis, Transplantation, Von Hippel-Lindau (VHL) 

10 syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, hypercalcemia, Parkinson's 
disease, Huntington's disease, Cerebral palsy, Epilepsy, Lesch-Nyhan syndrome, Multiple 
sclerosis, Ataxia-telangiectasia, Leukodystrophies, Behavioral disorders, Addiction, Anxiety, 
Pain, Neuroprotection, or other pathologies or conditions. The NOV 17 nucleic acid encoding 
the acyl CoA desaturase-like protein of the invention, or fragments thereof, may further be 

1 5 useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV 17 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 17 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

20 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV1 7 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

25 disorders. 

NOV18 

A disclosed NOV18 nucleic acid of 853 nucleotides (also referred to as CG57328-01) 
encoding a myo-inositol-1 (or 4) monophosphatase-like protein is shown in Table 18 A. The 
start and stop codons are in bold letters. 

30 



Table 18A. NOV18 nucleotide sequence (SEQ ID NO:45). 



ATGGCTGATCCTTGGCAGGAATGCATGGATTATGCAGTAACCCTAGCAAGACAAGCTGGA 
GAGGTGGTTCATGATGCTCTTAAAAATGAAGTGAATGTTATCCTGAAAGGTTCTCCAGTT 
GATTTGGTAACTGCTACTGACCAAAAAGTTGAAAAAATGCTTATCTCTTCCATAAAGGAA 
AAGTATCCATCTCATAGGTATTTTTTTATTGTGAGGAATCTGGCAGCTGGGGAAAAAGGT 
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GTCTTAACTGACAACCCTACGTGGATCATTGACCCTATTGATGGAACAACTAAGTTTGTC 
CATAGATTTCCTTTTGTAGCTGTTTCGATTGGCCTTGTTGTAAATAAGAAGGTAGAATTT 
GGAGTTGTGTACAGTTGTGTGGAAGACAAGAGGTACACTGTCAGGAAAGGAAAAGGTGCC 
TTTTATAATGGTC7VAAAACTACAGGTTTCACAAGAAGGTGATATTACCAAATCACTCTTG 
GTGACCGAGCTGGGCTATTGCAGAACATCAGAAATTGTAAGAACTATTCTTTCCAATATG 
GAAAAGCTTTCTTGCATTCCTATTCACGGTATCCAGAGTGTTGGAACAGCAGCTACTAAT 
ATGTGCATTGCGGCAAGTGGAGGAGCAGAGGCATTTTATGAAATGGGAATTCACTGCTGG 
GATATTGCAGTAGCTGCCATTATTGTTACTGAAGCTGGTGGCGTGCTAATGGATGTTACT 
GGTGGACCATTCCATTTAATGTCACGGAGAATAATTGCTGCAAATTGTACAGCATTAGCA 
GAAAGGATAGCCAAAGAAATTCAGGTAGCACCTTTTCAATGA GATGATGAAGATTAATTA 
CAGCAGCCTCATA ^ ~~ ~ 



10 



In a search of public sequence databases, the NOV 18 nucleic acid sequence, located on 
chromsome 8 has 761 of 853 bases (89%) identical to a gbrGENBANK- 
ID:AF042729|acc:AF042729.2 mRNA from Homo sapiens (Homo sapiens lithium-sensitive 
rnyo-inositol monophosphatase Al (IMPA1) mRNA, complete cds). Public nucleotide 
databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV18 polypeptide (SEQ ID NO:46) encoded by SEQ ID NO:45 has 
273 amino acid residues and is presented in Table 18B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 18 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.6500. 



Table 18B. Encoded NOV18 protein sequence (SEQ ID NO:46). 



^4ADPWQECMDYAVTLARQAGEVVHDAI J KNEVNVII J KGSPVDI J VTATDQKVEKMLISSIKE 
KYPSHRYFFI VRNLAAGEKGVLTDNPTWI IDPIDGTTKFVHRFPFVAVS IGLWNKKVEF 
GVVYS CVEDKR YTVRKGKGAF YNGQKLQVS QEGD I TKS LLVTELG YCRTS E I VRTI LSNM 
EKLSCIPIHGIQSVGTAATNMCIAASGGAEAFYEMGIHCWDIAVAAIIVTEAGGVLMDVT 
GGPFHLMSRRI IAANCTALAERI AKEIQVAPFQ 



A search of sequence databases reveals that the NOV 18 amino acid sequence has 222 
of 273 amino acid residues (81%) identical to, and 240 of 273 amino acid residues (87%) 
similar to, the 277 amino acid residue ptnr:SWISSNEW-ACC:P29218 protein from Homo 

15 sapiens (Human) (MYO-INOSITOL-l(OR 4)-MONOPHOSPHATASE (EC 3.1 .3,25) 

(IMPASE) (IMP) (INOSITOL MONOPHOSPHATASE) (LITHIUM-SENSITIVE MYO- 
INOSITOL MONOPHOSPHATASE Al)). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV 18 is expressed in at least Brain, Lung. This information was derived by 

20 determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 
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The disclosed NOV 18 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 18C. 



Table 18C. BLAST results for NOV18 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 18570157 | ref | XP 
095533 .1 | 
(XM_095533) 


hypothetical 
protein XP_095533 
[Homo sapiens] 


512 








gi | 5031789 | ref |NP 0 
05527. l| 
<NM_005536) 


inositol (myo) - 
1 (or 4) - 

monophosphatase 1 
[Homo sapiens] 


277 


222/273 
(81%) 


240/273 
(87%) , 


e-124 


gi|3914092|sp|P9769 
7|MYOP RAT 


Myo- inositol -1 (or 
4) - 

monophosphatase 
(IMPase) (IMP) 
(Inositol 
monophosphatase ) 
(Lithium- 
sensitive myo- 
inositol 
monophosphatase 
Al) 


277 


222/273 
(81%) 


240/273 
(87%) , 


e-124 


gi | 4433 82 |pdb| 2HHM | 
A 


Chain A, Human 
Inositol 
Monophosphatase 
(E.C.3 .1.3 .25) 
Dimer Complex 
With Gadolinium 
And Sulfate 


276 








giil27716|sp|P20456 
|MYOP BOVIN 


Myo-inositol-1 (or 
4) - 

monophosphatase 
(IMPase) (IMP) 
(Inositol 
monophosphatase) 
(Lithium- 
sensitive myo- 
inositol 
monophosphatase 
Al) 


277 


209/273 
(76%) 


236/273 
(85%) , 


e-119 



Table 18D lists the domain descriptions from DOMAIN analysis results against 
NOV 18. This indicates that the NOV1 8 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 18D. Domain Analysis of NOV18 

gnl 1 Pf am|pf am00459 , inositol_P, Inositol monophosphatase family 

CD-Length = 270 residues, 97.0% aligned 

Score = 238 bits (607), Expect = 3e-64 
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Myo-inositol- 1 (or 4)-monophosphatase enzyme catalyzes the reaction: 

myo-inositol 1 -monophosphate + H(2)0 <=> myo-inositol + phosphate * Acts on both 
enantiomers of myo-inositol- 1 -phosphate and myo- inositol 4-phosphate. It does not act on 
inositol bisphosphates, trisphosphates or tetrakisphosphates. It has been shown that several 
5 proteins share two sequence motifs. Two of these proteins are enzymes of the inositol 
phosphate second messenger signalling pathway: 

- Vertebrate and plants inositol monophosphatase (EC 3.1.3.25). - Vertebrate inositol 
polyphosphate 1 -phosphatase (EC 3.1.3.57). 

Other proteins are: 

1 0 - Bacterial protein cysQ. CysQ could help to control the pool of PAPS (3*- 

phosphoadenoside 5'-phosphosulfate), or be useful in sulfite synthesis. - Escherichia coli 
protein suhB. Mutations in suhB results in the enhanced synthesis of heat shock sigma factor 
(htpR). - Neurospora crassa protein Qa-X. Probably involved in quinate metabolism. - 
Emericella nidulans protein qutG. Probably involved in quinate metabolism. - Yeast protein 

15 HAL2/MET22 involved in salt tolerance as well as methionine biosynthesis. - Yeast 

hypothetical hypothetical protein YHR046c. - Caenorhabditis elegans hypothetical protein 
F13G3.5. - A Rhizobium leguminosarum hypothetical protein encoded upstream of the pss 
gene for exopolysaccharide synthesis. - Methanococcus jannaschii hypothetical protein 
MJ0109. 

20 It is suggested that these proteins may act by enhancing the synthesis or degradation of 

phosphorylated messenger molecules. From the X-ray structure of human inositol 
monophosphatase, it seems that some of the conserved residues are involved in binding a 
metal ion and/or the phosphate group of the substrate. 

The disclosed NOV 18 nucleic acid of the invention encoding a myo-inositol- 1 (or 4) 
25 monophosphatase-like protein includes the nucleic acid whose sequence is provided in Table 
1 8A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 1 8 A while still 
encoding a protein that maintains its myo-inositol- 1 (or 4) monophosphatase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
30 nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
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of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
5 variant nucleic acids, and their complements, up to about 1 1 percent of the bases may be so 
changed. 

The disclosed NOV 18 protein of the invention includes the myo-inositol-1 (or 4) 
monophosphatase-like protein whose sequence is provided in Table 18B. The invention also 
includes a mutant or variant protein any of whose residues may be changed from the 

10 corresponding residue shown in Table 18B while still encoding a protein that maintains its 
myo-inositol-1 (or 4) monophosphatase-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 1 9 percent of the 
residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

1 5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this myo-inositol-1 (or 
4) monophosphatase-like protein (NOV 18) may function as a member of a "myo-inositol-1 (or 
4) monophosphatase family". Therefore, the NOV 18 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 

20 pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 

25 those defined here. 

The NOV1 8 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the myo-inositol-1 (or 4) 
monophosphatase-like protein (NOV 18) may be useful in gene therapy, and the myo-inositol- 

30 1 (or 4) monophosphatase-like protein (NOV 18) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from Systemic lupus erythematosus , 
Autoimmune disease, Asthma, Emphysema, Scleroderma, allergy, Von Hippel-Lindau (VHL) 
syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's 
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disease, Huntington's disease, Cerebral palsy, Epilepsy, Lesch-Nyhan syndrome, Multiple 
sclerosis, Ataxia-telangiectasia, Leukodystrophies, Behavioral disorders, Addiction, Anxiety, 
Pain, Neuroprotection, or other pathologies or conditions. The NOV 18 nucleic acid encoding 
the myo-inositol-1 (or 4) monophosphatase-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV1 8 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 18 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 18 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV19 

A disclosed NOV19 nucleic acid of 2071 nucleotides (also referred to as CG57358- 
01) encoding a spinster-like protein is shown in Table 19A. The start and stop codons are in 
bold letters. 



Table 19A. NOV19 nucleotide sequence (SEQ ID NO:47). 

CCCCCCGCCGCCCCGATCCGGGCCGGCA TGATGTGCCTGGAATGCGCCTCGGCGGCGGCG 
GGCGGCGCGGAGGAGGAGGAGGCGGACGCGGAGCGGCGGCGCCGGCGCCGGGGGGCGCAG 
CGAGGGGCTGGCGGTAGCGGTTGCTGCGGGGCGCGGGGCGCGGGCGGCGCTGGAGTCTCG 
GCCGCGGGCGATGAGGTGCAGACGCTGTCGGGCAGCGTAAGGCGGGCCCCGACCGGACCC 
CCCGGCACCCCCGGCACCCCCGGCTGCGCAGCTACTGCAAAGGGCCCCGGCGCTCAGCAG 
CCCAAACCGGCCAGCTTGGGCCGCGGGCGGGGGGCAGCCGCCGCCATCCTCAGCTTGGGC 
AACGTGCTCAACTACCTGGACAGGTACACCGTGGCAGGCGTCCTTCTGGACATCCAGCAG 
CACTTTGGGGTCAAGGACCGAGGCGCCGGCCTGCTGCAGTCAGTGTTCATCTGTAGCTTC 
ATGGTGGCTGCCCCCATCTTCGGCTACCTGGGCGACCGCTTCAACAGGAAGGTGATTCTC 
AGCTGCGGCATTTTCTTCTGGTCGGCCGTCACCTTCTCCAGCTCCTTCATTCCCCAGCAG 
TACTTCTGGCTGCTGGTCCTGTCCCGGGGGCTGGTGGGCATCGGGGAGGCCAGCTACTCC 
ACCATCGCCCCCACTATCATTGGCGACCTCTTCACCAAGAACACGCGTACGCTCATGCTG 
TCCGTCTTCTACTTCGCCATCCCACTGGGCAGTGGCCTGGGCTACATTACTGGCTCCAGC 
GTGAAGCAGGCAGCCGGAGACTGGCACTGGGCATTGCGGGTGTCCCCTGTCCTGGGCATG 
ATCACAGGAACACTCATCCTCATTCTGGTCCCAGCCACTAAAAGGGGTCATGCCGACCAG 
CTCGGGGACCAGCTCAAGGCCCGGACCTCATGGCTCCGAGATATGAAGGCCCTGATTCGA 
AACCGCAGCTACGTCTTCTCCTCCCTGGCCACGTCGGCTGTCTCCTTCGCCACGGGGGCC 
CTGGGCATGTGGATCCCGCTCTACCTGCACCGCGCCCAAGTTGTGCAGAAGACAGCAGAG 
ACGTGCAACAGCCCGCCCTGTGGGGCCAAGGACAGCCTCATCTTTGGGGCCATCACCTGC 
TTTACGGGATTTCTGGGCGTGGTCACGGGGGCAGGAGCCACGCGCTGGTGCCGCCTGAAG 
ACCCAGCGGGCCGACCCACTGGTGTGTGCCGTGGGCATGCTGGGCTCTGCCATCTTCATC 
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TGCCTGATCTTCGTGGCTGCCAAGAGCAGCATCGTAGGAGCCTATATCTGTATCTTCGTC 
GGGGAGACGCTGCTGTTTTCTAACTGGGCCATCACTGCAGACATCCTCATGTACGTGGTC 
ATCCCCACGCGGCGCGCCACTGCCGTGGCCTTGCAGAGCTTCACCTCCCACCTGCTGGGG 
GACGCCGGGAGCCCCTACCTCATTGGCTTTATCTCAGACCTGATCCGCCAGAGCACTAAG 
GACTCCCCGCTCTGGGAGTTCCTGAGCCTGGGCTACGCGCTCATGCTCTGCCCTTTCGTC 
GTGGTCCTGGGCGGCATGTTCTTCCTCGCCACTGCGCTCTTCTTCGTCAGCGACCGCGCC 
AGGGCTGAGCAGCACCTGGGGGAGAGACGGGCGGGGGTCAGGGTGGTGCATCAGCGGGGG 
CCGGGCCCGGGCACTGCTCTGGCACATCGTGTCGTGGGGGCCAGCTGACCGGAGGTGCTG 
GGC AGGGA CCTCGTC AGC CC C AGGGGGAG ATGGGAGAGCC C AGGGGTGGGGAGAGG AGAG 
AGAGAGGAGTAAAGAGGAAAGGAG AAAGAAGTCAGAAAGTAAGAGGAAGGGGAGGGGCCC 
CAGCTTTGAAAACCACTAAGTCCAGAGACAA ACCCAAGTCTGGATCCACCAGACACCCCC 
GTGGCCTCCCACAGCTCCAGGCTGACCCTGGCACTGGGCCTCAGGGCTGGACCCCAGCAA 
CCAGTGGGTGCACTGAGTGCATGGG AGGTCTG TACCCTCCCCGCCCCACCCCAGGGCAGG 
GCTCACGGTGGCTATCACGGTCCCTGCTTCC 



In a search of public sequence databases, the NOV 19 nucleic acid sequence, located on 
chromsome 17 has 290 of 431 bases (67%) identical to a gb:GENBANK- 
ID:E12646|acc:E12646.1 mRNA from Homo sapiens (cDNA encoding cell growth inhibiting 
5 factor). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV 19 polypeptide (SEQ ID NO:48) encoded by SEQ ID NO:47 has 
566 amino acid residues and is presented in Table 19B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV 19 has no signal peptide and is 
10 likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 19B. Encoded NOV19 protein sequence (SEQ ID NO:48 ). 

ITOCLECASAAAGGAEEEEADAERRRRRRGAQRGAGGSGCCGARGAGGAGVSAAGDEVQTL 
SGSVRRAPTGPPGTPGTPGCAATAKGPGAQQPKPASLGRGRGAAAAILSLGNVLNYLDRY 
TVAGVLLD I QQHFGVKDRGAGLLQS VF ICS FMVAAP I FGYLGDRFNRKVI LS CG IFFWSA 
VTFSSSFIPQQYFWLLVLSRGLVGIGEASYSTIAPTIIGDLFTKNTRTLMLSVFYFAIPL 
GSGLGYITGSSVKQAAGDWHWALRVS PVLGMITGTLILILVPATKRGHADQLGDQLKART 
S WLRDMKAL I RNR S YVF S S LAT S AVS FATGALGMW I PL YLHRAQ WQKTAETCNS P P CGA 
KDSLIFGAITCFTGFLGWTGAGATRWCRLKTQRADPLVCAVGMLGSAIFICLIFVAAKS 
SIVGAYICIFVGETLLFSNWAITADILJVrYVVIPTRRATAVALQSFTSHLLGDAGSPYLIG 
F I S DL I RQS TKDS P L WE FL S LG YALML C P FWVLGGMFFLATAL F FVS DRARAEQHLGER 
RAGVRWHQRGPGPGTALAHRWGAS 



A search of sequence databases reveals that the NOV19 amino acid sequence has 268 
of 495 amino acid residues (54%) identical to, and 330 of 495 amino acid residues (66%) 
1 5 similar to, the 528 amino acid residue ptnr:TREMBLNE W- ACC : A AG43 830 protein from 

Homo sapiens (Human) (SPINSTER-LIKE PROTEIN). Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 

NOV is expressed in at least brain and heart. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
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but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV 19 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 19C. 



Table 19C. BLAST results for NOV19 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 15079262 | gb | AAH1 


protein for 
IMAGE:3154539) 
[Mus musculus] 


590 


514/539 
(95%) 


523/539 
(96%) 


0.0 


1467 . 1 [AAH11467 


(BC011467) 


gi | 18448989| gb | AAL6 


not really- 
started [Danio 
rerio] 


506 


248/431 
(57%) 


312/431 
(71%) , 


e-133 


9987 . 1 | AF465772 1 


(AF465772) 


gi | 13 544 043 | qb | AAHO 


(protein for 
IMAGE:3627317) 
[Homo sapiens] 


524 


266/478 
(55%) 


327/478 
(67%) , 


e-129 


6156 .1 | AAH06156 
(BC006156) 


gi | 14042968 | ref | NP 


spinster- like 
protein [Homo 
sapiens] 


528 


268/495 
(54%) 


330/495 
(66%) , 


e-129 


114427. 1 | 
(NM 032038) 


gi | 12963795 | ref |NP 


spinster-like 
protein [Mus 
musculus] 


528 


266/510 
(52%) 


330/510 
(64%) , 


e-126 


076201 . 1 | 


(NM 023712) 



Table 19D-E lists the domain descriptions from DOMAIN analysis results against 
NOV 19. This indicates that the NOV sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 19D. Domain Analysis of NOV19 

gnl 1 Pf am[ pf am00083 , sugar__tr, Sugar (and other) transporter. 

CD-Length = 447 residues, 28.2% aligned 

Score = 55.1 bits (131), Expect = le-08 
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Table 19E. Domain Analysis of NOV19 

gnl | Pf am | pf am0313 7 , OATP_C, Organic Anion Transporter Polypeptide 
(OATP) family, C- terminus. This family consists of several eukaryotic 
Organic-Anion-Transporting Polypeptides (OATPs) . Several have been 
identified mostly in human and rat. Different OATPs vary in tissue 
distribution and substrate specificity. Since the numbering of 
different OATPs in particular species was based originally on the 
order of discovery, similarly numbered OATPs in humans and rats did 
not necessarily correspond in function, tissue distribution and 
substrate specificity (in spite of the name, some OATPs also transport 
organic cations and neutral molecules). Thus, Tamai et al . initiated 
the current scheme of using digits for rat OATPs and letters for human 
ones. Prostaglandin transporter (PGT) proteins (e.g. Pfam:Q92959) are 
also considered to be OATP family members. In addition, the 
methotrexate transporter OATK (Pf am : P70502 ) is closely related to 
OATPs. This family aligns residues towards the C-terminus. The family 
OATP_N aligns residues from similar proteins towards the N-terminus. 
This family also includes several predicted proteins from 
Caenorhabditis elegans and Drosophila melanogaster . This similarity 
was not previously noted. Note: Members of this family are described 
(in the Swiss-Prot database) as belonging to the SLC21 family of 
transporters . 

CD-Length = 375 residues, 18.9% aligned 
Score = 37.7 bits (86), Expect = 0.002 



NOV19 is a homolog of the spinster-like proteins in human and mouse. Spinster is a 
novel membrane protein in Drosophila, mutants of which exhibit accumulation of ceroid 
lipofuscin and neural degeneration (See Nakano et al., Genbank entry for AAG43830.1). 
Accumulation of ceroid lipofuscin occurs in several hereditary disorders that are probably 
related to lysosomal storage defects. The pigment makes fibroblasts in vitro more susceptible 
to oxidative stress, leading to apoptosis (See Terman et al., Exp Gerontol 1999 Sep;34(6):755- 
70). It is also a component of atherogenic lesions in arteries (See Hoffe and Hoppe, Curr Opin 
Lipidol 1995 Oct;6(5):3 17-25). In neurons, accumulation of the pigment leads to 
neurodegeneration, even leading to death in some cases (See Dyken and Wisniewski, Am J 
Med Genet 1995 Jun 5;57(2): 150-4). This is seen both in inherited forms of human ceroid 
lipofuscinoses (See Jagadha et al., Acta Neuropathol (Berl) 1988;75(3):233-40) and in a 
cathepsin-D-deficient mouse model (See Koike et al., J Neurosci 2000 Sep 15;20(18):6898- 
906). In at least one case, neuronal deposition of ceroid lipofuscin was also correlated with 
accumulation in the myocardium (See Jay and Haslam, J Inherit Metab Dis 1995;18(3):359- 
60). 

The disclosed NOV 19 nucleic acid of the invention encoding a spinster-like protein 
includes the nucleic acid whose sequence is provided in Table 19A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
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from the corresponding base shown in Table 19A while still encoding a protein that maintains 
its spinster-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
5 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 

10 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 33 percent of the 
bases may be so changed. 

The disclosed NOV 19 protein of the invention includes the spinster-like protein whose 
sequence is provided in Table 19B. The invention also includes a mutant or variant protein 

1 5 any of whose residues may be changed from the corresponding residue shown in Table 1 9B 
while still encoding a protein that maintains its spinster-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 46 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

20 (F a b)2. that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this spinster-like protein 
(NOV 19) may function as a member of a "spinster family". Therefore, the NOV 19 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

25 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

30 The NOV 19 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the spinster-like protein 
(NOV 19) may be useful in gene therapy, and the spinster-like protein (NOV 19) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
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compositions of the present invention will have efficacy for treatment of patients suffering 
from cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, aortic stenosis, 
atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary 
stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, tuberous sclerosis, 
5 scleroderma, obesity, transplantation, Von Hippel-Lindau (VHL) syndrome, Alzheimer's 
disease, stroke, hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, 
epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, 
behavioral disorders, addiction, anxiety, pain, neurodegeneration, cancer, tissue degeneration, 
bacterial/viral/parasitic infection, or other pathologies or conditions. The NOV 19 nucleic acid 
10 encoding the spinster-like protein of the invention, or fragments thereof, may further be useful 
in diagnostic applications, wherein the presence or amount of the nucleic acid or the protein 
are to be assessed. 

NOV 19 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 19 substances for use in 

15 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 19 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

20 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV20 

A disclosed NOV20 nucleic acid of 752 nucleotides (also referred to as CG57695-01) 
encoding a casein-like protein is shown in Table 20A. The start and stop codons are in bold 
25 letters. 



Table 20A. NOV20 nucleotide sequence (SEQ ID NO:49). 

CAAACAACCATGAAGTTCTTCATCTTTACCTGCCTTTTGGCTGTTGCTCTGGCACATCAT 
GAACTTAAACACGTTTACAAGAAAAAAACAAACAACAACGTGAGTGACAAATACAGAAAT 
GTGAAAAACCAGATTTCTTCTCCTCAGGAGGACAAAGTTAGAGGTAATTTTCATTCAAAT 
AAAATAAAATTAATACCACTTAGTAGTGTTTTATTTTTGTATATTTGTATACAAATTAAT 
TTTTTTTCTTACCAGGAAGTTAAGCACACTGTGGATCAAAAGCACTACGTAAAACAACTG 
AACAAAATCAACCCATTTTATCAGAAGTGGAACTTCCTCCCATTTCTTCAGGTTCCTTAT 
CAACATCAGATTTTTATAAACCCAGGAGATCAGCATAAGACAAGTGTCTACCCCTTTGTT 
CCCACT7^AATATATACAGTGGCCAGGCTCAGTGGCTCAGGCCTTCTTGTTTTATTCCTTT 
AAGGAAACACCAAAAAAGACTGTTGATATGGTAAAGTATTGTTTCTATCAGAAAACTGAG 
CTGACTGAAGAAGAAAAGAATGACCAAAAACATCTGAACAAAATCAACCAGTATTATCAG 
TTCACCTTGCCCCAATATGTAAAAGCTGTTTATCAATATCACAAAATTATGAAACCATGG 



^TU, n^n jljji ^^^r^ -3* m ;S^. ili*^ ^JJ *t 



AAAAACATGAAGACAAATGCTTACCAAGTTATCCCCACTCTGGTGAGTGCTCTCTTTTTA 
TTTGCAACTTAAAAATAGTTATCTGTTGTGCT 



In a search of public sequence databases, the NOV20 nucleic acid sequence, located 
on chromsome 4 has 291 of 445 bases (65%) identical to a gb:GENBANK- 
ID:CHI249995|acc:AJ249995.1 mRNA from Capra hircus (Capra hircus mRNA for alpha s2- 
casein (cnsls2 gene)). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV20 polypeptide (SEQ ID NO:50) encoded by SEQ ID NO:49 has 
240 amino acid residues and is presented in Table 20B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV20 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.5140. 



Table 20B. Encoded NOV20 protein sequence (SEQ ID NO:50). 



MKFFI FTCLLAVALAHHELKHVYKKKTNNNVS DKYRNVKNQ I S S PQEDKVRGNFHSNK I K 
LIPLSSVLFLYICIQINFFSYQEVKHTVDQKHYVKQLNKINPFYQKWNFLPFLQVPYQHQ 
IFINPGDQHKTSVYPFVPTKYIQWPGSVAQAFLFYSFKETPKKTVDMVKYCFYQKTELTE 
EEKNDQKHLNKINQYYQFTLPQYVKAVYQ^ 



A search of sequence databases reveals that the NOV20 amino acid sequence has 1 1 2 
of 232 amino acid residues (48%) identical to, and 142 of 232 amino acid residues (61%) 
similar to, the 235 amino acid residue ptnr:pir-id:A48383 protein from pig (alpha s2-casein). 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV20 is expressed in at least lung, testis, and b-cell. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV20 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 20C. 



Table 20C. BLAST results for NOV20 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|477186|pir| |A483 
83 


alpha s2-casein - 
pig 


235 


111/234 
(47%) 


141/234 
(59%) 


2e-37 
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gi|729044|sp|P39036 
| CAS 2 PIG 


Alpha-S2 casein 
precursor 


235 


109/234 
(46%) 


138/234 
(58%) 


2e-35 


qi | 115658 | sp| P04654 
|CAS2 SHEEP 


Alpha-S2 casein 
precursor 


223 


100/227 
(44%) 


128/227 
(56%) 


3e-28 


gi|41675l|sp|P33049 
|CAS2 CAPHI 


Alpha~S2 casein 
precursor (Alpha- 
S2-CN) 


223 


96/227 
(42%) 


125/227 
(54%) 


4e-28 


gi|423305|pir| |S338 
81 


alphas2 -casein B 
- goat 


223 


96/227 
(42%) 


125/227 
(54%) , 


4e-28 
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Table 20D lists the domain descriptions from DOMAIN analysis results against 
NOV20. This indicates that the NOV20 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 20D. Domain Analysis of NOV20 

gnl 1 Pf am j pf am00363 , caseins, Casein. 

CD-Length = 84 residues, 71.4% aligned 

Score = 44.3 bits (103), Expect = 8e-06 



NOV20 has homology to pig alpha S casein. Caseins are the major protein constituent 
of milk. Caseins can be classified into two families; the first consists of the kappa-caseins, and 
the second groups the alpha-sl, alpha-s2 ? and beta-caseins. The alpha/beta caseins are a 
rapidly diverging family of proteins. However two regions are conserved: a cluster of 
phosphorylated serine residues and the signal sequence. Alpha-s2 casein is known as epsilon- 
casein in mouse, gamma-casein in rat and casein- A in guinea pig. Alpha-sl casein is known as 
alpha-casein in rat and rabbit and as casein-B in guinea-pig. 

Milk casein can be separated by urea starch electrophoresis into 3 regions, alpha, beta 
(1 15460), and kappa (601695) casein. Alpha and beta variants are present in the human 
population. Voglino and Ponzone (See Voglino, G. F.; Ponzone, A.: Nature N.B. 238: 149, 
1972) postulated 2 biallelic systems. In Italy the frequency of the 2 alpha alleles was 0.908 and 
0.092; 2 beta alleles had a frequency of 0.678 and 0.322. 

Fujiwara et al. (See Fujiwara, Y.et al., Hum, Genet. 99: 368-373, 1997) found that the 
human alpha-Si, beta-, and kappa-casein genes are closely linked and arranged in that order. 
By fluorescence in situ hybridization, they demonstrated that the casein gene family is 
localized to 4q21.1. Rijnkels et al. (See Rijnkels, M., et al., Mammalian Genome 8: 285-286, 
1997) concluded that the human 'locus' comprises at least 4 casein genes: 3 genes encoding 
calcium-sensitive, casein- like genes, and 1 kappa-casein gene, in the order alpha-sl— beta— 
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alpha-s2~kappa. The approximate size of the human casein gene locus is 350 kb. Chen et al. 
(See Chen, C.-S. et al., Cytogenet. Cell Genet. 69: 260-265, 1995.) suggested that the casein 
cluster is located within 700 kb of the albumin (103600) gene cluster, which is located on 
4ql3. 

5 The disclosed NOV20 nucleic acid of the invention encoding a casein-like protein 

includes the nucleic acid whose sequence is provided in Table 20A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 20A while still encoding a protein that maintains 
its casein-like activities and physiological functions, or a fragment of such a nucleic acid. The 

10 invention further includes nucleic acids whose sequences are complementary to those just 

described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

15 phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 35 percent of the 
bases may be so changed. 

20 The disclosed NOV20 protein of the invention includes the casein-like protein whose 

sequence is provided in Table 20B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 20B 
while still encoding a protein that maintains its casein-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 52 

25 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this casein-like protein 
(NOV20) may function as a member of a "casein family". Therefore, the NOV20 nucleic acids 

30 and proteins identified here may be useful in potential therapeutic applications implicated in 
(but not limited to) various pathologies and disorders as indicated below. The potential 
therapeutic applications for this invention include, but are not limited to: protein therapeutic, 
small molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 
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research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 
(but not limited to) those defined here. 

The NOV20 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
5 and disorders as indicated below. For example, a cDNA encoding the casein-like protein 

(NOV20) may be useful in gene therapy, and the casein-like protein (NOV20) may be useful 
when administered to a subject in need thereof By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from fertility, hypogonadism, systemic lupus erythematosus, autoimmune disease, asthma, 

10 emphysema, scleroderma, allergy, ARDS, hemophilia, hypercoagulation, idiopathic 
thrombocytopenic purpura, autoimmune disease, allergies, immunodeficiencies, 
transplantation, graft versus host disease (GVHD), lymphaedema, or other pathologies or 
conditions. The NOV20 nucleic acid encoding the casein-like protein of the invention, or 
fragments thereof, may further be useful in diagnostic applications, wherein the presence or 

1 5 amount of the nucleic acid or the protein are to be assessed. 

NOV20 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV20 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
20 NOVX Antibodies" section below. The disclosed NOV20 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

25 NOV21 

A disclosed NOV21 nucleic acid of 1704 nucleotides (also referred to as CG57654-01) 
encoding a gamma-aminobutyric acid receptor-like protein is shown in Table 21 A. The start 
and stop codons are in bold letters. 



Table 21A. NOV21 nucleotide sequence (SEQ ID NO:51). 

CCTGACGCTTTGATGGTATCTGCAAGCGTTTTTQCTGATCTTATCTCTGCCCCCTGAATA 
TTAATTCCCTAATCTGGTAGCAATCCATCTCCCCAGTG AAGGACCTACTAGAGGCAGGTG 
GGGGGAGCCACCATC AGATCATCAAGCA TAAGAATAATACAAAGGGGAGGGATTCTTCTG 
CAACCAAGAGGCAAGAGGCGAGA GAAGGAA AAAAAAAAAAAAAGCGA TGAGTTCACCAAA 
TATATGGAGCACAGGAAGCTCAGTCTACTCGACTCCTGTATTTTCACAGAAAATGACGGT 
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GTGGATTCTGCTCCTGCTGTCGCTCTACCCTGGCTTCACTAGCCAGAAATCTGATGATGA 
CTATGAAGATTATGCTTCTAACAAAACATGGGTCTTGACTCCAAAAGTTCCTGAGGGTGA 
TGTCACTGTCATCTTAAACAACCTGCTGGAAGGATATGACAATAAACTTCGGCCTGATAT 
AGGAGTGAAGCCAACGTTAATTCACACAGACATGTATGTGAATAGCATTGGTCCAGTGAA 
CGCTATCAATATGGAATACACTATTGATATATTTTTTGCGCAAATGTGGTATGACAGACG 
TTTGAAATTTAACAGCACCATTAAAGTCCTCCGATTGAACAGCAACATGGTGGGGAAAAT 
CTGGATTCCAGACACTTTCTTCAGAAATTCCAAAAAAGCTGATGCACACTGGATCACCAC 
CCCCAACAGGATGCTGAGAATTTGGAATGATGGTCGAGTGCTCTACTCCCTAAGGTTGAC 
AATTGATGCTGAGTGCCAATTACAATTGCACAATTTTCCAATGGATGAACACTCCTGCCC 
CTTGGAGTTCTCCAGTTATGGCTATCCACGTGAAGAAATTGTTTATCAATGGAAGCGAAG 
TTCTGTTGAAGTGGGCGACACAAGATCCTGGAGGCTTTATCAATTCTCATTTGTTGGTCT 
AAGAAATACCACCGAAGTAGTGAAGACAACTTCCGGAGATTATGTGGTCATGTCTGTCTA 
CTTTGATCTGAGCAGAAGAATGGGATACTTTACCATCCAGACCTATATCCCCTGCACACT 
CATTGTCGTCCTATCCTGGGTGTCTTTCTGGATCAATAAGGATGCTGTTCCAGCCAGAAC 
ATCTTTAGGTATCACCACTGTCCTGACAATGACCACCCTCAGCACCATTGCCCGGAAATC 
GCTCCCCAAGGTCTCCTATGTCACAGCGATGGATCTCTTTGTATCTGTTTGTTTCATCTT 
TGTCTTCTCTGCTCTGGTGGAGTATGGCACCTTGCATTATTTTGTCAGCAACCGGAAACC 
AAGCAAGGACAAAGATAAAAAGAAGAAAAACCCTCTTCTTCGGATGTTTTCCTTCAAGGC 
CCCTACCATTGATATCCGCCCAAGATCAGCAACCATTCAAATGAATAATGCTACACACCT 
TCAAGAGAGAGATGAAGAGTACGGCTATGAGTGTCTGGACGGCAAGGACTGTGCCAGTTT 
TTTCTGCTGTTTTGAAGATTGTCGAACAGGAGCTTGGAGACATGGGAGGATACATATCCG 
CATTGCCAAAATGGACTCCTATGCTCGGATCTTCTTCCCCACTGCCTTCTGCCTGTTTAA 
TCTGGTCTATTGGGTCTCCTACCTCTACCTGTG AGGAGGTATGGGTTT TACTGATATGGT 
TCTTATTCACTGAGTCTCATGGAG 



In a search of public sequence databases, the NOV21 nucleic acid sequence, located on 
chromsome 5 has 1379 of 1398 bases (98%) identical to a gb.GENBANK- 
ID:HSGABAAS|acc:Xl 5376.1 mRNA from Homo sapiens (Human mRNA for GABA-A 
receptor, gamma 2 subunit). Public nucleotide databases include all GenBank databases and 
the GeneSeq patent database. 

The disclosed NOV21 polypeptide (SEQ ID NO:52) encoded by SEQ ID NO:51 has 
475 amino acid residues and is presented in Table 2 IB using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV21 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for aNOV21 peptide is between amino acids 39 and 40. 



Table 21B. Encoded NOV21 protein sequence (SEQ ID NO:52). 



MSSPNIWSTGSSVYSTPVFSQKMTVWILLLLSLYPGFTSQKSDDDYEDYASNKTWVLTPK 
VPEGDVTVILNNLLEGYDNKLRPDIGVKPTLIHTDMYVNSIGPVNAINMEYTIDIFFAQM 
WYDRRLKFNS T I KVLRLNSNMVGK I W I PDT FFRNS KKADAHWI TTPNRMLR I WNDGR VL Y 
SLRLTIDAECQLQLHNFPMDEHSCPLEFSSYGYPREEIVYQWKRSSVEVGDTRSWRLYQF 
SFVGLRNTTEWKTTSGDYVVMSVYFDLSRRMGYFTIQTYIPCTLIWLSWVSFWINKDA 
VPARTSLGITTVLTMTTLSTIARKSLPKVSYVTAMDLFVSVCFIFVFSALVEYGTLHYFV 
SNRKPSKDKDKKKKNPLLRMFSFKAPTIDIRPRSATIQMNNATHLQERDEEYGYECLDGK 
DCASFFCCFEDCRTGAWRHGRIHIRIAKMDSYARIFFPTAFCLFNLVYWVSYLYL 



A search of sequence databases reveals that the NOV21 amino acid sequence has 467 
of 475 amino acid residues (98%) identical to, and 467 of 475 amino acid residues (98%) 



140 



£3* ^» -vss ^ s?^ *r& 7S3? 

>?* ^*i3» an fi«a« 1W« H»«i. "LiJ 1 

similar to, the 467 amino acid residue ptnr:SWISSNEW-ACC:P 18507 protein from Homo 
sapiens (Human) (GAMMA- AMINOBUTYRIC -ACID RECEPTOR GAMMA-2 SUBUNIT 
PRECURSOR (GABA(A) RECEPTOR)). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 
5 NOV21 is expressed in at least Adrenal Gland/Suprarenal gland, Brain, Hippocampus, 

Pituitary Gland, and Right Cerebellum. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CG57654-0L 
The sequence is predicted to be expressed in the following tissues because of the expression 
pattern of (GENBANK- ID: gb:GENBANK-ID:HSGABAAS|acc:Xl 5376.1) a closely related 

10 Human mRNA for GABA-A receptor, gamma 2 subunit homolog in species Homo sapiens: 
fetal brain. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources, Public 
EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV21 polypeptide has homology to the amino acid sequences shown 

1 5 in the BLASTP data listed in Table 2 1 C. 



Table 21C. BLAST results for NOV21 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|45576ll|ref |NP 0 
00807. 1[ 
(NM_000816) 


gamma - 

aminobutyric acid 
A receptor, gamma 
2 precursor [Homo 
sapiens] 


467 


467/475 
(98%) 


467/475 
(98%) 


0.0 


gi | 573 813 8 |gb| AAD50 
273. 1| (AF165124) 


gamma - 

aminobutyric acid 
A receptor gamma 
2 [Homo sapiens] 


467 


465/475 
(97%) 


466/475 
(97%) 


0.0 


gi|l20784)sp|P22300 
|GAC2 BOVIN 


GAMMA - 

AM I NOBUT YR I C - AC I D 
RECEPTOR GAMMA-2 
SUBUNIT PRECURSOR 
(GABA(A) 
RECEPTOR) 


475 


467/475 
(98%) 


470/475 
(98%) 


0.0 


gi | 108682 |pir| |B392 
21 


gamma - 

aminobutyric acid 
receptor A gamma- 
2L chain - bovine 


475 


467/475 
(98%) 


470/475 
(98%) 


0.0 


gi | 6679915 [ ref |NP 0 
32099 . 1 | 
(NM_008073) 


gamma - 

aminobutyric acid 
(GABA-A) 

receptor, subunit 
gamma 2 [Mus 
musculus] 


474 


469/475 
(98%) 


471/475 
(98%) 


0.0 
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Table 2 ID lists the domain descriptions from DOMAIN analysis results against 
NOV2 1 . This indicates that the NOV2 1 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 21D. Domain Analysis of NOV21 

gnl | Pfam| pf amQ2 931 , Neur_chan_LBD, Neurotransmitter-gated ion-channel 
ligand binding domain. This family is the extracellular ligand binding 
domain of these ion channels. This domain forms a pentameric 
arrangement in the known structure . 

CD-Length = 200 residues, 91.5% aligned 

Score = 170 bits (431) , Expect = le-43 



Neurotransmission effected by GABA (gamma-aminobutyric acid) is predominantly 
mediated by a gated chloride channel intrinsic to the GABAA receptor. This heterooligomeric 
receptor exists in most inhibitory synapses in the vertebrate central nervous system (CNS) and 
can be regulated by clinically important compounds such as benzodiazepines and barbiturates. 
The primary structures of GABAA receptor alpha- and beta-subunits have been deduced from 
cloned complementary DNAs. Co-expression of these subunits in heterologous systems 
generates receptors which display much of the pharmacology of their neural counterparts, 
including potentiation by barbiturates. Conspicuously, however, they lack binding sites for, 
and consistent electrophysiological responses to, benzodiazepines. (See Pritchett et al. Nature 
1989;338:582-5) reported the isolation of a cloned cDNA encoding a new GABAA receptor 
subunit, termed gamma 2, which shares approximately 40% sequence identity with alpha- and 
beta-subunits and whose messenger RNA is prominently localized in neuronal subpopulations 
throughout the CNS. Importantly, coexpression of the gamma 2 subunit with alpha 1 and beta 
1 subunits produces GABAA receptors displaying high-affinity binding for central 
benzodiazepine receptor ligands. 

The disclosed NOV21 nucleic acid of the invention encoding a gamma-aminobutyric 
acid receptor-like protein includes the nucleic acid whose sequence is provided in Table 21 A 
or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 21 A while still 
encoding a protein that maintains its gamma-aminobutyric acid receptor-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
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thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
5 antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 2 percent of the bases may be so 
changed. 

The disclosed NO V2 1 protein of the invention includes the gamma-aminobutyric acid 
receptor-like protein whose sequence is provided in Table 2 IB. The invention also includes a 
1 0 mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 2 IB while still encoding a protein that maintains its gamma- 
aminobutyric acid receptor-like activities and physiological functions, or a functional fragment 
thereof. In the mutant or variant protein, up to about 2 percent of the residues may be so 
changed. 

1 5 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this gamma- 
aminobutyric acid receptor-like protein (NOV21) may function as a member of a "gamma- 
aminobutyric acid receptor family". Therefore, the NOV2 1 nucleic acids and proteins 

20 identified here may be useful in potential therapeutic applications implicated in (but not 

limited to) various pathologies and disorders as indicated below. The potential therapeutic 
applications for this invention include, but are not limited to: protein therapeutic, small 
molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 

25 research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 
(but not limited to) those defined here. 

The NOV21 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the gamma-aminobutyric 

30 acid receptor-like protein (NOV21) may be useful in gene therapy, and the gamma- 
aminobutyric acid receptor-like protein (NOV21) may be useful when administered to a 
subject in need thereof. By way of nonlimiting example, the compositions of the present 
invention will have efficacy for treatment of patients suffering from adrenoleukodystrophy, 
congenital adrenal hyperplasia, Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, 
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stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, cerebral 
palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, 
leukodystrophies, behavioral disorders, addiction, anxiety, pain, neurodegeneration, endocrine 
dysfunctions, diabetes, obesity, growth and reproductive disorders, or other pathologies or 
conditions. The NOV21 nucleic acid encoding the gamma-aminobutyric acid receptor-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV21 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV21 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV21 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV22 

A disclosed NOV22 nucleic acid of 1602 nucleotides (also referred to as 57724-01) 
encoding a carboxylesterase-like protein is shown in Table 22A. The start and stop codons are 
in bold letters. 




Table 22A. NOV22 nucleotide sequence (SEQ ID NO:53). 

ATGCGGCTGCACAGACTTCACGCGCGGCCGAGCGCGGTGGCCTGTGGGCTCCTGCTGCTTCTGATGCTGT 
GTGGGCCCGAAGTTGCTCAGCCTGAAGTAGACACCACCCTGGGTCGTGTGCGAGGCCGGCAGGTGGGCGT 
GAAGGGCACAGACCGCCTTGTGAATGTCTTTCTGGGCATTCCATTTGCCCAGCCGCCACTGGGCCCTGAC 
CGGTTCTCAGCCCCACACCCAGCACAGCCCTGGGAGGGTGTGCGGGATGCCAGCACTGCGCCCCCAATGT 
G C CT ACAAGACGTGATGAACAG CAG CAGATTTGT CCTCAACGGAAAACAG CAGAT CTT CT C CGTTT CAGA 
GGACTGCCTGGTCCTCAACGTCTATAGCCCAGCTGAGGTCATGGTATGGGTCCATGGAGGCGCTCTGATA 
ACTGGCGCTGCCACCTCCTACGATGGATCAGCTCTGGCTGCCTATGGGGATGTGGTCGTGGTTACAGTCC 
AGTACCGCCTTGGGGTCCTTGGCTTCTTCAGCACTGGAGATGAGCATGCACCTGGCAACCAGGGCTTCCT 
AGATGTGGTAGCTGCTTTGCGCTGGGTGCAAGAAAACATCGCCCCCTTCGGGGGTGACCTCAACTGTGTC 
ACTGTCTTTGGTGGATCTGCCGGTGGGAGCATCATCTCTGGCCTGGTCCTGTCCCCAGTGGCTGCAGGGC 
TGTTC C ACAGAGC C AT C ACAC AGAGTGGGGT CAT CACC AC CC CAGGGATC AT CGACT CT CAC C C TTGGC C 
CCTAGCTCAGAAAATCGCAAACACCTTGGCCTGCAGCTCCAGCTCCCCGGCTGAGATGGTGCAGTGCCTT 
CAGCAGAAAGAAGGAGAAGAGCTGGTCCTTAGCAAGAAGCTGAAAAATACTATCTATCCTCTCACCGTTG 
ATGGCACTGTCTTCCCCAAAAGCCCCAAGGAACTCCTGAAGGAGAAGCCCTTCCACTCTGTGCCCTTCCT 
CATGGGTGTCAACAACCATGAGTTCAGCTGGCTCATCCCCGGGACCAAGGTGATGCGTGTGTCCAACAAG 
ATGAT CATGAAGT T C C CGC T AAAC CGG C AGG CG ATGAG AAAGGAAAC CAT CACT AAGATGC T CTGG AGT A 
CC CG CAC C CTG TTGGAGCATG AC TGGAAGATG CTACGAAACCG TATGATGG ACATAGT T C AAG ATG CCAC 
TTTCGTGTATGCCACACTGCAGACTGCTCACTACCACCGAGATGCCGGCCTCCCTGTCTACCTGTATGAA 
TTTGAGCACC^CGCTCGTGGAATAATCGTCAAACCCCGCACTGATGGGGCAGACCATGGGGATGAGATGT 
ACTTCCTCTTTGGGGGCCCCTTCGCCACAGGCCTTTCCATGGGTAAGGAGAAGGCACTTAGCCTCCAGAT 
GATGAAATACTGGGCCAACTTTGCCCGCACAGGAAACCCCAATGATGGGAATCTGCCCTGCTGGCCACGC 
TACAACAAGGATG AAAAGTAC C TG CAG CTGG ATTT T AC CACAAGAG TGGG CATG AAG C T CAAGG AG AAG A 
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AGATGGC TTTTTGGATGAGT CTGT ACC AGT CTCAAAG ACC TG AG AAG CAGAGGCAATT CT AA 



In a search of public sequence databases, the NOV22 nucleic acid sequence, located on 
chromsome 16 has 695 of 735 bases (94%) identical to a gb:GENBANK- 
ID:AK000105|acc:AK000105.1 mRNA from Homo sapiens (Homo sapiens cDNA FLJ20098 
5 fis, clone COL04537, highly similar to ESTM_MOUSE LIVER CARBOXYLESTERASE 
PRECURSOR). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV22 polypeptide (SEQ ID NO:54) encoded by SEQ ID NO:53 has 
533 amino acid residues and is presented in Table 22B using the one-letter amino acid code. 
1 0 Signal P, Psort and/or Hydropathy results predict that NOV22 has a signal peptide and is 

likely to be localized extracellularly with a certainty of 0.7953. The most likely cleavage site 
for a NOV22 peptide is at amino acid position 29. 



Table 22B. Encoded NOV22 protein sequence (SEQ ID NO:54). 



MRLHRLHARPSAVACGLLLLLMLCGPEVAQPEVDTTLGRVRGRQVGVKGTDRLVNVFLGI 

PFAQPPLGPDRFSAPHPAQPWEGVRDASTAPPMCLQDVMNSSRFVLNGKQQIFSVSEDCL 

VLNVYS PAE VMVWVHGGAL I TGAATS YDGS ALAAYGDVVVVTVQ YRLGVLGFFS TGDEHA 

PGNQGFLD WAALRWQENI APFGGDLNCVTVFGGS AGGS 1 1 SGLVLS PVAAGLFHRAIT 

QSGVITTPGIIDSHPWPLAQKIANTLACSSSSPAEMVQCLQQKEGEELVLSKKLKNTIYP 

LTVTDGTVFPKSPKELLKEKPFHSVPFLMGVNNHEFSWLIPGTKVMRVSNKMIMKFP 

AMRKETITKMLWSTRTLLEHDWKMLRNRMMDIVQDATFVYATLQTAHYHRDAGLPVYLYE 

FEHHARGIIVKPRTDGADHGDEMYFLFGGPFATGLSMGKEKALSLQMMKYWANFARTGNP 

NDGNLPCWPRYNKDEKYLQLDFTTRVGMKLKEKKMAFWMSLYQSQRPEKQRQF 



A search of sequence databases reveals that the NOV22 amino acid sequence has 296 
1 5 of 544 amino acid residues (54%) identical to, and 373 of 544 amino acid residues (68%) 
similar to, the 554 amino acid residue ptnr:SWISSPROT-ACC:Q63880 protein from Mus 
musculus (Mouse) (LIVER CARBOXYLESTERASE PRECURSOR (EC 3.1.1.1) (ES- 
MALE) (ESTERASE-3 1))(. Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

20 NOV22 is expressed in at least liver, colon, small intestine, kidney, pancrease, brain, 

and plasma. Expression information was derived from the tissue sources of the sequences that 
were included in the derivation of the sequence of CG57724-01 .The sequence is predicted to 
be expressed in the following tissues because of the expression pattern of (GENBANK-ID: 
gb:GENBANK-ID:AK000105|acc:AK000105.1) a closely related Homo sapiens cDNA 

25 FLJ20098 fis, clone COL04537, highly similar to ESTM_MOUSE LIVER 

CARBOXYLESTERASE PRECURSOR homolog in species Homo sapiens : liver, colon, 
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small intestine, kidney, pancrease, brain, and plasma. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV22 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 22C. 



Table 22C. BLAST results for NOV22 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 17512361 | gb | AAH1 
9147.1|AAH19147 
(BC019147) 


protein for 
MGC:2 93 82) [Mus 
musculus] 


568 


293/547 
(53%) 


370/547 
(67%) 


e-156 


gi|2494382|sp|Q6388 
0|ESTM MOUSE 


LIVER 

CARBOXYLESTERASE 
PRECURSOR (ES- 
MALE) ( ESTERASE - 
31) 


554 


297/560 
(53%) 


369/560 
(65%) 


e-155 


gi | 14789873 | gb | AAH1 
0812 . 1 | AAH10812 
(BC010812) 


(protein for 
IMAGE:4211034) 
[Mus musculus] 


524 


272/538 
(50%) 


346/538 
(63%) 


e-140 


gi| 730714 | sp|Q04791 
|SASB ANAPL 


FATTY ACYL-COA 
HYDROLASE 
PRECURSOR, MEDIUM 
CHAIN 

(THIOESTERASE B) 


557 


= 252/547 
(46%) 


346/547 
(63%) 


e-136 


gi | 57554 | emb | CAA463 
91. 1| (X65296 


carboxylest erase 
[Rattus rattus] 


565 


(42%) 


336/559 
(59%) 


e-124 



Table 22D lists the domain descriptions from DOMAIN analysis results against 
NOV22. This indicates that the NOV22 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 22D. Domain Analysis of NOV22 

gnl | P£am|pf amQ013 5 , COesterase, Carboxylesterase 

CD-Length = 532 residues, 93.0% aligned 

Score = 374 bits (960), Expect = 8e-105 



The mammalian carboxylesterases (EC 3.1.1.1) comprise a multigene family, the gene 
products of which are localized in the endoplasmic reticulum (ER) and cytosol of many 
tissues. These enzymes efficiently catalyze the hydrolysis of a variety of ester- and amide- 
containing chemicals, as well as drugs (including prodrugs) to the respective free acids. They 
are involved in detoxification or metabolic activation of various drugs, environmental 
toxicants, and carcinogens. Carboxylesterases also catalyze the hydrolysis of endogenous 
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compounds such as short- and long-chain acylglycerols, long-chain acylcarnitine, and long- 
chain acyl-CoA esters. Multiple isozymes of hepatic microsomal carboxylesterases exist in 
various animal species, and some of these isozymes are involved in the metabolic activation of 
certain carcinogens and are associated with hepatocarcinogenesis. 

5 Several studies have shown that various carboxylesterases are present in a wide variety 

of organs and tissues of many mammalian species; the highest hydrolase activity occurs in the 
liver. Humans express carboxylesterase in the liver, small intestine, brain, stomach, colon, 
pancreas, kidney, macrophages, monocytes, and plasma. Carboxylesterases, in addition to the 
metabolism of exogenous compounds, have been shown to hydrolyze endogenous fatty acid 

10 esters of steroids in both rat pancreas and kidney. The nonspecific esterases found in brain 
appear to be present only in the central nervous system, and four unique carboxylesterases 
have been isolated from human brain extract. Carboxylesterase activity of is found 
predominantly in the microsomal fraction, although significant carboxylesterase activity is 
present in the lysosomal fraction, and the lysosomes contribute substantially to the general 

1 5 esterolytic capacity of liver. The microsomal and lysosomal enzymes can be differentiated on 
the basis of both substrate specificity and structure and are considered to belong to separate 
classes. Carboxylesterase activity is also found in the cytosolic fraction of brain and in the 
plasma. Carboxylesterase is present in the plasma, but it is most likely syn-thesized in liver 
and then secreted into the circulation via the Golgi apparatus. 

20 Human liver and plasma carboxylesterase activates lovastatin, and there are a 

significant number of additional drugs and endogenous compounds that are substrates of 
carboxylesterases, e.g. dipivefrin hydrochloride, carbonates, cocaine, salicylates, capsaicin, 
palmitoyl-coenzyme A, haloperidol, imidapril, pyrrolizidine alkaloids, and steroids. 

The disclosed NOV22 nucleic acid of the invention encoding a carboxylesterase-like 

25 protein includes the nucleic acid whose sequence is provided in Table 22A or a fragment 

thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 

be changed from the corresponding base shown in Table 22A while still encoding a protein 

that maintains its carboxylesterase-like activities and physiological functions, or a fragment of 

such a nucleic acid. The invention further includes nucleic acids whose sequences are 

30 complementary to those just described, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 

nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of nonlimiting example, 

modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
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derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 6 percent of the bases may be so changed. 

The disclosed NOV22 protein of the invention includes the carboxylesterase-like 
protein whose sequence is provided in Table 22B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 22B while still encoding a protein that maintains its carboxylesterase-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 46 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this carboxylesterase- 
like protein (NOV22) may function as a member of a "carboxylesterase family". Therefore, 
the NOV22 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV22 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the carboxylesterase-like 
protein (NOV22) may be useful in gene therapy, and the carboxylesterase-like protein 
(NOV22) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from hepatocarcinoma, as well as other diseases, disorders and 
conditions, along with patients receiving pharmacotherapy with drug classes known to be 
metabolized by carboxylesterases, such as salicylates, carbonates, pyrrolizidine alkaloids, and 
steroids, or other pathologies or conditions. The NOV22 nucleic acid encoding the 
carboxylesterase-like protein of the invention, or fragments thereof, may further be useful in 
diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are 
to be assessed. 
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NOV22 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV22 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV22 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV23 

A disclosed NOV23 nucleic acid of 996 nucleotides (also referred to as CG57730-01) 
encoding a MAT- 1 -like protein is shown in Table 23 A. The start and stop codons are in bold 
letters. 



Table 23 A. NOV23 nucleotide sequence (SEQ ID NO:55). 

CTCTCGAATTCCCCCACCCACCTGTACTCTGGAGAGACTGTGCTGGGAACATGTACCACT 
GAGCCTGAGATGGGGATGAGGGC AGAGAGAGGGGAGCCCCCTCT TCCACTCAGTTGTTCC 
TACTCAGACTGTTGCACTCTAAACCTAGGGAGGTTGAAGAATGAG ACCCTTAGGTTTTAA 
CACGAATCCTGACACCACCATCTATAGGGTCCCAAC TTGGTTATTGTAGGCAACCTTCCC 
TCTCTC CTTGGTGAAGAACATCCCAAGCCAGAAAGAAGTTAAC TACAGTGTTTTCCTTTG 
CACCGATCCCCACCCCAATTCAATCCCG GAAGGGACTTACTTAGGAAACCCTTCTTTACT 
AGATATCCTGGCCC CCTGGGCTTGTGAACACCTCCTAGCCACATCACTACAGTACAGTGA 
GTGACCC CAGCCTCCTGCCTACCCCAAGATGCCCCTCC CCACCC TGACCGTGCTAACTGT 
GTGTACA TATATATTCTACATATA TGTATATTAAAACTGCACTGCCATGTCTGCCCTTTT 
TTGTGGTGTCTAGCATTAACTTATTGTCTAGGCCAGAGCGGGGGTGGGAGGGGAATGCCA 
CAGTGAAGGGAGTGGCAGAATCAAATTGCTACATAGTCCAAACAAAAAAGAAGGCTTTTT 
CAAAAAACATTAAATTCACATGCAGTCTCAGAGACTATTTAGACAAAGTTCAAGTTAGGA 
GCTTTTAG GATGTGG GAGTAAAACTTTAA TGGGAGGGGAGGGCTGGCTGCTGGAAGAAGG 
AAGAAGCCAGACTGG TTAGACAGTACTCT TAACTCCTAGCCCAGCCTAGCGTGCCCTGCC 
CCTCTGGCCACTGCT GCAGACACCTGCCT TAACACACACACCTCTAGGACTCCACAGTTT 
TGCCTTA AAGGACCTTCCCAAGTCTCCCTTTCCCTGTCTGGCTTCTCCCTTAAGAAGAGA 
GAGATACTTGTAGAATTGGGTGGGGGAATTCGAGAG ~~~ - - — — — 



In a search of public sequence databases, the NOV23 nucleic acid sequence, located on 
chromsome lq21.1 has 983 of 987 bases (99%) identical to a gb:GENBANK- 
ID:PEAGENE3|acc:AFl 53274.1 mRNA from Homo sapiens (Homo sapiens PEA15 protein 
(PEA15) gene, exons 3 and 4 and complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 
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The disclosed NOV23 polypeptide (SEQ ID NO:56) encoded by SEQ ID NO:55 has 
74 amino acid residues and is presented in Table B using the one-letter amino acid code. 
Signal P ? Psort and/or Hydropathy results predict that NOV23 has a signal peptide and is 
likely to be localized in the endoplasmic reticulum with a certainty of 0.5500. The most likely 
5 cleavage site for a NOV23 peptide is between amino acids 18 and 19. 



Table 23B. Encoded NOV23 protein sequence (SEQ ID NO:56). 

MYIKTALPCLPPFWSSINLLSRPERGWEGNATVKGVAESNCYIVQTKKKAFSKNIKFTC 
S LRD YLDKVQVRS F 



A search of sequence databases reveals that the NOV23 amino acid sequence has 62 of 
75 amino acid residues (82%) identical to, and 66 of 75 amino acid residues (88%) similar to, 
the 75 amino acid residue ptnr:SPTREMBL-ACC:Q14801 protein from Homo sapiens 
10 (Human) (HYPOTHETICAL 8.6 KDA PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV23 is expressed in at least ovary, testis, brain, amygdala, pancreas, colon, and 
stomach.. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources, Public 
1 5 EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV23 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 23C. 



Table 23C. BLAST results for NOV23 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


qi|l4714659|gb|AAHl 


Similar to 
homolog of mouse 
MAT- 1 oncogene 
[Homo sapiens] 


74 


74/74 
(100%) 


74/74 
(100%) 


le-34 


0469.l|AAH10469 


(BC010469) 


gi|7019425|ref | NP 0 


homolog of mouse 
MAT-1 oncogene; 
Phosphoprotein 
enriched in 
astrocytes, 15kD 
[Homo sapiens] 


75 


62/75 
(82%) 


66/75 
(87%) , 


5e-27 


37419 . 1 | 
(NM_013287) 


gi| 6678812 |ref | NP 0 


phosphoprotein 
enriched in 
astrocytes 15; 
mammary 

transforming gene 
1 [Mus musculus] 


61 


27/30 
(90%) 


27/30 
(90%) 


8e-9 


32582. l| 
(NM_008556) 
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An efficient in vitro transformation system has been developed using N-methyl-N- 
nitrosourea that allows the role of hormones and growth factors in mouse mammary 
tumorigenesis to be studied. Utilizing this system, it was reported that mammary tumors 
induced in vitro with N-methyl-N-nitrosourea in the presence of mammogenic hormones 
5 (progesterone and prolactin) contain predominately an activated c-Ki-ras protooncogene with 
a G35 — > A3 5 transitional mutation in the 12th codon. Mammary tumors induced in the 
presence of another mitogen, lithium (Li), do not have a mutation in the c-Ki-ras 
protooncogene. By using an expression cloning system, a plasmid clone containing a 1.75-kb 
cDNA insert has been isolated from this group of tumors. Nucleic acid sequence analysis of 

10 the insert reveals that it has a short open reading frame of 61 amino acids and that it does not 
have sequence homology with any known gene. The gene, designated MAT1, can 
neoplastically transform NIH 3T3 cells and also the mammary epithelial cell line TM3. 
Expression of this gene occurs in normal mouse tissues including mammary gland and is 
overexpressed in the original mammary tumors as indicated by Northern blot analysis. In vitro 

1 5 transcription and translation of the clone shows a protein product of 6000 Da, which agrees 
with the predicted open reading frame. 

The disclosed NOV23 nucleic acid of the invention encoding a MAT- 1 -like protein 
includes the nucleic acid whose sequence is provided in Table 23 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 

20 from the corresponding base shown in Table 23 A while still encoding a protein that maintains 
its MAT- 1 -like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

25 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

30 In the mutant or variant nucleic acids, and their complements, up to about 1 percent of the 
bases may be so changed. 

The disclosed NOV23 protein of the invention includes the MAT- 1 -like protein whose 
sequence is provided in Table 23B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 23B 
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while still encoding a protein that maintains its MAT- 1 -like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 12 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
5 (Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MAT- 1 -like protein 
(NOV23) may function as a member of a "MAT-1 family". Therefore, the NOV23 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

1 0 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

1 5 The NOV23 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the MAT-1 -like protein 
(NOV23) may be useful in gene therapy, and the MAT-1 -like protein (NOV23) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 

20 compositions of the present invention will have efficacy for treatment of patients suffering 
from Cataract, zonular pulverulent- 1; MHC class II deficiency, complementation group C; 
cancer, Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch- 
Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 

25 disorders, addiction, anxiety, pain, neurodegeneration; diabetes, pancreatitis, obesity; fertility, 
or other pathologies or conditions. The NOV23 nucleic acid encoding the MAT-1 -like protein 
of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV23 nucleic acids and polypeptides are further useful in the generation of 
30 antibodies that bind immuno-specifically to the novel NOV23 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV23 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
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assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV24 

A disclosed NOV24 nucleic acid of 668 nucleotides (also referred to as CG57755-01) 
encoding a vacuolar proton- ATPase subunit H-like protein is shown in Table 24 A. The start 
and stop codons are in bold letters. 



Table 24A. NOV24 nucleotide sequence (SEQ ID NO:57). 



TTCGCCCTCCCGGTCATCATCTTCACCACGTTCTGGGGCCTCGTCGGCATCGCCGGGCCC 
TGGTTCGTGCCGAAGGGACCCAACCGCGGAGTGATCATCACCATGCTGGTCGCCACCGCC 
GTCTGCTGTTACCTCTTCTGGCTCATCGCCATCCTGGCGCAGCTGAACCCCCTGTTCGGG 
CCCCAGCTGAAGAATGAGACCATCTGGTACGTGCGCTTCCTGTGGGAGTGACCCGCCGCC 
CCCGACCCAGGTGCCCAGCTCTCGGAATGACTGTGGCTCCACTGTCCCTGACA ACCCCTT 
CGTCCGGACCCTCCCCCACACAACTATGTCTGGTCACCAGCTCCCTCCTGCTGGCACCCA 
GAGACCCGGACCCGCAGGGCCTGCCTGGTTCCTGGAAGTCTTCCCAGTCTTCC CAGCCAG 
CCCGGGCCCTGGGGAGCCCTGGGCACAGCAGCGGCCGAGGGGATGTCCTGCTCCAATACC 
CGCACTGCTCTGGAGTTTGCCCTCTTTCCCAAGGAGATGCTGCTGGGGAGCTGGTATGGG 
TGGGGTCTTTCCCTTTACAGACGGGGCAGATGCCAGGACTCAGCCCATCCTGA GGAGGAC 
ACGTGTCCTCATGGAGAGGGTGCTCCGGCCCAGGCGGGGGAGTCGGTGCCCAGTCAGCAG 
GACCAGGC ~~ " ~ ~ " ~~ 



In a search of public sequence databases, the NOV24 nucleic acid sequence has 169 of 
230 bases (73%) identical to a gb:GENBANK-ID:AF258614|acc:AF258614.1 mRNA from 
Canis familiaris (Canis familiaris vacuolar proton-ATPase subunit ATP6H (ATP6H) mRNA, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV24 polypeptide (SEQ ID NO:58) encoded by SEQ ID NO:57 has 
76 amino acid residues and is presented in Table 24B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV24 has a signal peptide and is 
likely to be localized in the plasma membrane with a certainty of 06400. The most likely 
cleavage site for a NOV24 peptide is between amino acids 53 and 54. 



Table 24B. Encoded NOV24 protein sequence (SEQ ID NO:58). 



FALPVI IFTTFWGLVGI AGPWFVPKGPNRGVI ITMLVATAVCCYLFWLI AILAQLNPLFG 
PQLKNET I WYVRFLWE 



A search of sequence databases reveals that the NOV24 amino acid sequence has 56 of 

73 amino acid residues (76%) identical to, and 64 of 73 amino acid residues (87%) similar to, 

the 81 amino acid residue ptnr:SPTREMBL-ACC:Q9N0Ql protein from Canis familiaris 
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(Dog) (VACUOLAR PROTON-ATPASE SUBUNIT ATP6H). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV24 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 24C. 

5 



Table 24C. BLAST results for NOV24 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi| 14 7 897 87 |gb|AAHl 
0790 . 1 | AAH10790 
(BC010790 ) 


protein for 
MGC:18845) [Mus 
mi i <? 1 1 it o 1 


81 


75/76 
(98%) 


76/76 
(99%) 


4e-28 


gi | 18600961 | ref |XP 
095170. 1| 
(XM_095170) 


protein XP_095170 
[Homo sapiens] 


109 


65/86 
(75%) 


67/86 
(77%) 


9e-22 


gi | 8 050819 |gb|AAF71 
753 .1 |AF258614 1 
(AF258614) 


vacuolar proton- 
ATPase subunit 
ATP6H [Canis 
familiaris] 


81 


56/73 
(76%) 


64/73 
(86%) 


3e-21 


gi | 13507387 | gb ] AAK2 
8556 .l|AF343440 1 
(AF343440) 


lysosomal H+ 
transport ing - 
ATPase subunit 
M9.2 [Canis 
familiaris] 


81 


55/73 
(75%) 


64/73 
(87%) 


6e-21 


gi|l3384606|ref |NP 
079548. l| 
(NM_025272) 


ATPase , H+ 

transporting 

lysosomal 

(vacuolar proton 
pump ) , 9.2 k.Da 

[Mus musculus] 


81 


55/73 
(75%) 


65/73 
(88%) 


8e-21 



Vacuolar-type H(+)-ATPase (V -ATPase) is a multisubunit enzyme responsible for 
acidification of eukaryotic intracellular organelles. V-ATPase-dependent organelle 
acidification is essential for intracellular processes such as protein sorting, zymogen 

10 activation, and receptor-mediated endocytosis. The V- ATPase is composed of peripheral (VI) 
and integral (V0) membrane sectors. Proteolipids are major components of the V0 sector. In C. 
elegans, Oka et al. (See Oka, T. et al., J. Biol. Chem. 272: 24387-24392, 1997) identified the 
VHA1 and VHA2 genes, which encode 16-kD proteolipids, and the VHA4 gene, which 
encodes a 23-kD proteolipid product. Nishigori et al. (1998) isolated human cDNAs encoding 

15 a proteolipid which they designated 'ATP6F.' Sequence analysis revealed that the predicted 
205-amino acid protein shares 61% identity with the S. cerevisiae proteolipid VMA16 and 
67% identity with C. elegans VHA4. ATP6F contains 5 transmembrane segments and a 
conserved glutamic acid residue that is essential for proton transport activity in VMA16. As 
with ATP6C (108745), a 16-kD V-ATPase proteolipid, the N- and C-terminal halves of 

20 ATP6F share homology and may have resulted from a gene duplication event. The duplicated 
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segments of ATP6F and ATP6C are 75% similar on the amino acid level. Northern blot 
analysis indicated that the 1 . 1 -kb ATP6F mRNA was expressed in all tissues tested. The 
ATP6F gene contains 8 exons and spans approximately 4 kb. By FISH and radiation hybrid 
analysis, Nishigori et al. (1998) mapped the ATP6F gene to lp32.3. (See Nishigori et al., 
5 Genomics 50: 222-228, 1998). 

The vacuolar proton- ATPase (V-ATPase) is composed of an extramembrane catalytic 
sector and a transmembrane proton-conducting sector. See 603717. Ludwig et al. (1998) 
identified 2 novel proteins, 8-9 and 9.2 kD in size, in the membrane sector of bovine 
chromaffin granule V-ATPase. They designated the larger protein M9.2. By searching an EST 
1 0 database with the N-terminal sequence of bovine M9.2, Ludwig et al. (See Ludwig, et al., J. 

Biol. Chem. 273: 10939-10947, 1998) identified homologous cDNAs from human and mouse. 
The deduced 80-amino acid human M9.2 protein is extremely hydrophobic with 2 predicted 
membrane-spanning helices. Human and mouse M9.2 differed at only 1 amino acid position. 
Northern blot analysis revealed that M9.2 was present in all bovine tissues tested. 

15 The disclosed NOV24 nucleic acid of the invention encoding a vacuolar proton- 

ATPase subunit H-like protein includes the nucleic acid whose sequence is provided in Table 
24A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 24A while still 
encoding a protein that maintains its vacuolar proton- ATPase subunit H-like activities and 

20 physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 

25 of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 27 percent of the bases may be so 

30 changed. 

The disclosed NOV24 protein of the invention includes the vacuolar proton- ATPase 
subunit H-like protein whose sequence is provided in Table 24B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 24B while still encoding a protein that maintains its vacuolar proton- 
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ATPase subunit H-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 24 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2 5 that bind immunospecifically to any of the proteins of the invention. 
5 The above defined information for this invention suggests that this vacuolar proton- 

ATPase subunit H-like protein (NOV24) may function as a member of a "vacuolar proton- 
ATPase subunit H family". Therefore, the NOV24 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
10 invention include, but are not limited to: protein therapeutic, small molecule drug target, 

antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

1 5 The NO V24 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the vacuolar proton- 
ATPase subunit H-like protein (NOV24) may be useful in gene therapy, and the vacuolar 
proton- ATPase subunit H-like protein (NOV24) may be useful when administered to a subject 

20 in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from polycystic kidney disease I; 
osteopetrosis; mucolipidosis IV, or other pathologies or conditions. The NOV24 nucleic acid 
encoding the vacuolar proton- ATPase subunit H-like protein of the invention, or fragments 
thereof, may further be useful in diagnostic applications, wherein the presence or amount of 

25 the nucleic acid or the protein are to be assessed. 

NOV24 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV24 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
30 NOVX Antibodies" section below. The disclosed NOV24 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV25 

A disclosed NOV25 nucleic acid of 5587 nucleotides (also referred to as CG57503-01) 
encoding a MEGF7-like protein is shown in Table 25A. The start and stop codons are in bold 
5 letters. 



Table 25A. NOV25 nucleotide sequence (SEQ ID NO:59). 

ATGGGCCTAGGAGTCATACTACCTACCTGTTCCCCTCTTGACTTTCACTGTGACAATGGCAAGTGCATCC 
GCCGCTCCTGGGTGTGTGACGGGGACAACGACTGTGAGGATGACTCGGATGAGCAGGACTGTCCCCCCCG 
GGAGTGTGAGGAGGACGAGTTTCCCTGCCAGAATGGCTACTGCATCCGGAGTCTGTGGCACTGCGATGGT 
GACAATGACTGTGGCGACAACAGCGATGAGCAGTGTGACATGCGCAAGTGCTCCGACAAGGAGTTCCGCT 
GTAGTGACGGAAGCTGCATTGCTGAGCATTGGTACTGCGACGGTGACACCGACTGCAAAGATGGCTCCGA 
TGAGGAGAACTGTCCCTCAGCAGTGCCAGCGCCCCCCTGCAACCTGGAGGAGTTCCAGTGTGCCTATGGA 
CGCTGCATCCTCGACATCTACCACTGCGATGGCGACGATGACTGTGGAGACTGGTCAGACGAGTCTGACT 
GCTGTGAGTACTCTGGCCAGCTGGGAGCCTCCCACCAGCCCTGCCGCTCTGGGGAGTTCATGTGTGACAG 
TGGCCTGTGCATCAATGCAGGCTGGCGCTGCGATGGTGACGCGGACTGTGATGACCAGTCTGATGAGCGC 
AACTGCACCACCTCCATGTGTACGGCAGAACAGTTCCGCTGTCACTCAGGCCGCTGTGTCCGCCTGTCCT 
GGCGCTGTGATGGGGAGGACGACTGTGCAGACAACAGCGATGAAGAGAACTGTGAGAATACAGGAAGCCC 
CCAATGTGCCTTGGACCAGTTCCTGTGTTGGAATGGGCGCTGCATTGGGCAGAGGAAGCTGTGCAACGGG 
GTCAACGACTGTGGTGACAACAGCGACGAAAGCCCACAGCAGAATTGCCGGCCCCGGACGGGTGAGGAGA 
ACTGCAATGTTAACAACGGTGGCTGTGCCCAGAAGTGCCAGATGGTGCGGGGGGCAGTGCAGTGTACCTG 
CCACACAGG CTAC CGG CTCACAGAGGAT GGG CAC ACGTGC CAAGATGTGAATGAATGTGC CGAGGAGGGG 
TATTGCAGCCAGGGCTGCACCAACAGCGAAGGGGCTTTCCAATGCTGGTGTGAAACAGGCTATGAACTAC 
GGCCCGACCGGCGCAGCTGCAAGGCTCTGGGGCCAGAGCCTGTGCTGCTGTTCGCCAATCGCATCGACAT 
CCGGCAGGTGCTGCCACACCGCTCTGAGTACACACTGCTGCTTAACAACCTGGAGAATGCCATTGCCCTT 
GATTTCCACCACCGCCGCGAGCTTGTCTTCTGGTCAGATGTCACCCTGGACCGGATCCTCCGTGCCAACC 
TCAACGGCAGCAACGTGGAGGAGGTTGTGTCTACTGGGCTGGAGAGCCCAGGGGGCCTGGCTGTGGATTG 
GGTCCATGACAAACTCTACTGGACCGACTCAGGCACCTCGAGGATTGAGGTGGCCAATCTGGACGGGGCC 
CACCGGAAAGTGTTGCTGTGGCAGAACCTGGAGAAGCCCCGGGCCATTGCCTTGCATCCCATGGAGGGTA 
CCATTTACTGGACAGACTGGGGCAACACCCCCCGTATTGAGGCCTCCAGCATGGATGGCTCTGGACGCCG 
CATCATTGCCGATACCCATCTCTTCTGGCCCAATGGCCTCACCATCGACTATGCCGGGCGCCGTATGTAC 
TGGGTGGATGCTAAGCACCATGTCATCGAGAGGGCCAATCTGGATGGGAGTCACCGTAAGGCTGTCATTA 
GCCAGGGCCTCCCGCATCCCTTCGCCATCACAGTGTTTGAAGACAGCCTGTACTGGACAGACTGGCACAC 
CAAGAGCATCAATAGCGCTAACAAATTTACGGGGAAGAACCAGGAAATCATTCGCAACAAACTCCACTTC 
CCTATGGACATCCACACCTTGCACCCCCAGCGCCAACCTGCAGGGAAAAACCGCTGTGGGGACAACAACG 
GAGGCTGCACGCACCTGTGTCTGCCCAGTGGCCAGAACTACACCTGTGCCTGCCCCACTGGCTTCCGCAA 
GATCAGCAGCCACGCCTGTGCCCAGAGTCTTGACAAGTTCCTGCTTTTTGCCCGAAGGATGGACATCCGT 
CGAATCAGCTTTGACACAGAGGACCTGTCTGATGATGTCATCCCACTGGCTGACGTGCGCAGTGCTGTGG 
CCCTTGACTGGGACTCCCGGGATGACCACGTGTACTGGACAGATGTCAGCACTGATACCATCAGCAGGGC 
CAAGTGGGATGGAACAGGACAGGAGGTGGTAGTGGATACCAGTTTGGAGAGCCCAGCTGGCCTGGCCATT 
GATTGGGTCACCAACAAACTGTACTGGACAGATGCAGGTACAGACCGGATTGAAGTAGCCAACACAGATG 
GCAGCATGAGAACAGTACTCATCTGGGAGAACCTTGATCGTCCTCGGGACATCGTGGTGGAACCCATGGG 
CGGGTACATGTATTGGACTGACTGGGGTGCGAGCCCCAAGATTGAACGAGCTGGCATGGATGCCTCAGGC 
CGCCAAGTCATTATCTCTTCTAATCTGACCTGGCCTAATGGGTTAGCTATTGATTATGGGTCCCAGCGTC 
TATACTGGG CTGACGCCGGCATGAAGACAATTGAATTTGCTGGACTGGATGG CAGTAAGAGGAAGGTG CT 
GATTGGAAGCCAGCTCCCCCACCCATTTGGGCTGACCCTCTATGGAGAGCGCATCTATTGGACTGACTGG 
CAGACCAAGAGCATACAGAGCGCTGACCGGCTGACAGGGCTGGACCGGGAGACTCTGCAGGAGAACCTGG 
AAAACCTAATGGACATCCATGTCTTCCACCGCCGCCGGCCCCCAGTGTCTACACCATGTGCTATGGAGAA 
TGGCGGCTGTAGCCACCTGTGTCTTAGGTCCCCAAATCCAAGCGGATTCAGCTGTACCTGCCCCACAGGC 
ATCAACCTGCTGTCTGATGGCAAGACCTGCTCACCAGGCATGAACAGTTTCCTCATCTTCGCCAGGAGGA 
TAGACATTCGCATGGTCTCCCTGGACATCCCTTATTTTGCTGATGTGGTGGTACCAATCAACATTACCAT 
GAAGAACACCATTGCCATTGGAGTAGACCCCCAGGAAGGAAAGGTGTACTGGTCTGACAGCACACTGCAC 
AGGAT CAGTCG TGCCAAT C TGGATGGC T CACAG CATG AGGACATCATCAC CACAGGG CTACAGAC CACAG 
ATGGGCTCGCGGTTGATGCCATTGGCCGGAAAGTATACTGGACAGACACGGGAACAAACCGGATTGAAGT 
GGGCAACCTGGACGGGTCCATGCGGAAAGTGTTGGTGTGGCAGAACCTTGACAGTCCCCGGGCCATCGTA 
CTGTACCATGAGATGGGGTTTATGTACTGGACAGACTGGGGGGAGAATGCCAAGTTAGAGCGGTCCGGAA 
TGGATGGCTCAGACCGCGCGGTGCTCATCAACAACAACCTAGGATGGCCCAATGGACTGACTGTGGACAA 
GGCCAGCTCCCAACTGCTATGGGCCGATGCCCACACCGAGCGAATTGAGGCTGCTGACCTGAATGGTGCC 
AATCGGCATACATTGGTGTCACCGGTGCAGCACCCATATGGCCTCACCCTGCTCGACTCCTATATCTACT 
GGACTGACTGGCAGACTCGGAGCATCCACCGTGCTGACAAGGGTACTGGCAGCAATGTCATCCTCGTGAG 
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GTCCAACCTGCCAGGCCTCATGGACATGCAGGCTGTGGACCGGGCACAGCCACTAGGTTTTAACAAGTGC 
GGCTCGAGAAATGGCGGCTGCTCCCACCTCTGCTTGCCTCGGCCTTCTGGCTTCTCCTGTGCCTGCCCCA 
CTGGCATCCAGCTGAAGGGAGATGGGAAGACCTGTGATCCCTCTCCTGAGACCTACCTGCTCTTCTCCAG 
CCGTGGCTCCATCCGGCGTATCTCACTGGACACCAGTGACCACACCAATGTGCATGTCCCTGTTCCTGAG 
CTCAACAATGTCATCTCCCTGGACTATGACAGCGTGGATGGAAAGGTCTATTACACAGATGTGTTCCTGG 
ATGTTATCAGGCGAGCAGACCTGAACGGCAGCAACATGGAGACAGTGATCGGGCGAGGGCTGAAGACCAC 
TGACGGGCTGGCAGTGGACTGGGTGGCCAGGAACCTGTACTGGACAGACACAGGTCGAAATACCATTGAG 
GCGTCCAGGCTGGATGGTTCCTGCCGCAAAGTACTGATCAACAATAGCCTGGATGAGCCCCGGGCCATTG 
CTGTTTTCCCCAGGAAGGGGTACCTCTTCTGGACAGACTGGGGCCACATTGCCAAGATCGAACGGGCAAA 
CTTGGATGGTTCTGAGCGGAAGGTCCTCATCAACACAGACCTGGGTTGGCCCAATGGCCTTACCCTGGAC 
TATGATACCCGCAGGATCTACTGGGTGGATGCGCATCTGGACCGGATCGAGAGTGCTGACCTCAATGGGA 
AACTGCGGCAGGTCTTGGTCAGCCATGTGTCCCACCCCTTTGCCCTCACACAGCAAGACAGGTGGATCTA 
CTGG ACAG AC TGG C AG AC CAAGTC AAT CC AG CGTGTTG AC AAAT ACT C AGG C CGG AAC AAGG AG ACAGTG 
CTGGCAAATGTGGAAGGACTCATGGATATCATCGTGGTTTCCCCTCAGCGGCAGACAGGGACCAATGCCT 
GTGGTGTGAACAATGGTGGCTGCACCCACCTCTGCTTTGCCAGAGCCTCGGACTTCGTATGTGCCTGTCC 
TGACGAACCTGATAGCCAGCCCTGCTCCCTTGTGCCTGGCCTGGTACCACCAGCTCCTAGGGCTACTGGC 
ATGAGTGAAAAGAGCCCAGTGCTACCCAACACACCACCTACCACCTTGTATTCTTCAACCACCCGGACCC 
GCACGTCTCTGGAGGAGGTGGAAGGAAGATGCTCTGAAAGGGATGCCAGGCTGGGCCTCTGTGCACGTTC 
CAATGACGCTGTTCCTGCTGCTCCAGGGGAAGGACTTCATATCAGCTACGCCATTGGTGGACTCCTCAGT 
ATTCTGCTGATTTTGGTGGTGATTGCAGCTTTGATGCTGTACAGACACAAAAAATCCAAGTTCACTGATC 
CTGGAATGGGGAACCTCACCTACAGCAACCCCTCCTACCGAACATCCACACAGGAAGTGAAGATTGAAGC 
AATC CC CAAAC CAG C CATGTACAAC C AG CTGTGCTATAAGAAAG AGGGAGGG C CTGAC CAT AAC TAC ACC 
AAGGAGAAGATCAAGATCGTAGAGGGAATCTGCCTCCTGTCTGGGGATGATGCTGAGTGGGATGACCTCA 
AGCAACTGCGAAGCTCACGGGGGGGCCTCCTCCGGGATCATGTATGCATGAAGACAGACACGGTGTCCAT 
CCAGG C CAG C T CTGG CT CC CTGG ATG AC AC AG AG ACGG AG CAG CTG TTACAGG AAG AG CAGT CTG AGTGT 
AGCAGCGTCCATACTGCAGCCACTCCAGAAAGACGAGGCTCTCTGCCAGACACGGGCTGGAAACATGAAC 
GCAAGCTCTCCTCAGAGAGCCAGGTCTAAATGCCCACATTCTCTTCCCTGCCTGCCT 



In a search of public sequence databases, theNOV25 nucleic acid sequence, located on 
chromsome 1 1 has 4754 of 4759 bases (99%) identical to a gb.GENBANK- 
ID:AB01 1540|acc:AB01 1540.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
MEGF7, partial cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV25 polypeptide (SEQ ID NO:60) encoded by SEQ ID NO:59 has 
1852 amino acid residues and is presented in Table 25B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV25 has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.8200. 



Table 25B. Encoded NOV25 protein sequence (SEQ ID NO:60). 

MGLGVILPTCSPLDFHCDNGKCIRRSWVCDGDNDCEDDSDEQDCPPRECEEDEFPCQNGY 
CIRSLWHCDGDNDCGDNSDEQCDMRKCSDKEFRCSDGSCIAEHWYCDGDTDCKDGSDEEN 
CPSAVPAPPCNLEEFQCAYGRCILDIYHCDGDDDCGDWSDESDCCEYSGQLGASHQPCRS 
GEFMCDSGLCINAGWRCDGDADCDDQSDERNCTTSMCTAEQFRCHSGRCVRLSWRCDGED 
DCADNSDEENCENTGSPQCALDQFLCWNGRCIGQRKLCNGVNDCGDNSDESPQQNCRPRT 
GEENCNVNNGGCAQKCQMVRGAVQCTCHTGYRLTEDGHTCQDVNECAEEGYCSQGCTNSE 
GAFQCWCETGYELRPDRRSCKALGPEPVLLFANRIDIRQVLPHRSEYTLLLNNLENAIAL 
DFHHRRELVFWSDVTLDRILRANLNGSNVEEVVSTGLESPGGLAVDWVHDKLYWTDSGTS 
RIEVANLDGAHRKVLLWQNLEKPRAIALHPMEGTIYWTDWGNTPRIEASSMDGSGRRIIA 
DTHLFWPNGLTIDYAGRRMYWVDAKHHVIERANLDGSHRKAVISQGLPHPFAITVFEDSL 
YWTDWHTKS INSANKFTGKNQE 1 1 RNKLHFPMD IHTLHPQRQPAGKNRCGDNNGGCTHLC 
LPSGQNYTCACPTGFRKI S SHACAQS LDKFLLFARRMDIRRI S FDTEDLSDDVT PLADVR 
S AVALDWDS RDDHVYWTDVS TDT I S RAKWDGTGQE VWDTS LES PAGLAI DWVTNKLYWT 
D AGTDR I E VANTDG SMRTVL I WENLDR PRD I WE PMGG YKYWTDWGAS PKI ERAGMDAS G 
RQVI I S SNLTWPNGLAID YGSQRL YWADAGMKT I E FAGLDGS KRKVL I GS QL PHPFGLTL 
YGERIYWTDWQTKSIQSADRLTGLDRETLQENLENLMDIHVFHRRRPPVSTPCAMENGGC 
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SHLCLRSPNPSGFSCTCPTGINLLSDGKTCSPGMNSFLIFARRIDIRMVSLDIPYFADW 
VPINITMKNTIAIGVDPQEGKVYWSDSTLHRISRANLDGSQHEDIITTGLQTTDGLAVDA 
IGRKVYWTDTGTNRIEVGNLDGSMRKVLWQNLDSPRAIVLYHEMGFMYWTDWGENAKLE 
RSGMDGSDRAVLINimLGWPNGLTVDKASSQLLWADAHTERIEAADLNGAlSrRHTLVSPV^ 
HPYGLTLLDSYIYWTDWQTRSIHRADKGTGSNVILVRSNLPGLMDMQAVDRAQPLGFNKC 
GSRNGGCSHLCLPRPSGFSCACPTGIQLKGDGKTCDPSPETYLLFSSRGSIRRISLDTSD 
HTNVHVPVPELNNVISLDYDSVDGKWYTDVFLDVIRRADLNGSNMETVIGRGLKTTDGL 
AVDWVARNLYWTDTGRNTIEASRLDGSCRKVLINNSLDEPRAIAVFPRKGYLFWTDWGHI 
AK I ERANLDGS ERKVL I NTDLGW PNGLTLD YDTRR I YWVDAHLDR I E S ADLNGKLRQVLV 
SHVSHPFALTQQDRWIYWTDWQTKS IQRVDKYSGRNKETVLANVEGLMDI IWSPQRQTG 
TNACGVNNGGCTHLCFARASDFVCACPDEPDSQPCSLVPGLVPPAPRATGMSEKSPVLPN 
TPPTTLYSSTTRTRTSLEEVEGRCSERDARLGLCARSNDAVPAAPGEGLHISYAIGGLLS 
ILLILWIAALMLYRHKKSKFTDPGMGNLTYSNPSYRTSTQEVKIEAIPKPAMYNQLCYK 
KEGGPDHNYTKEKIKIVEGICLLSGDDAEWDDLKQLRSSRGGLLRDHVCMKTDTVSIQAS 
SGSLDDTETEQLLQEEQSECSSVHTAATPERRGSLPDTGWKHERKLSSESQV 



A search of sequence databases reveals that the NOV25 amino acid sequence has 1572 
of 1576 amino acid residues (99%) identical to, and 1574 of 1576 amino acid residues (99%) 
similar to, the 1576 amino acid residue ptnr:SPTREMBL-ACC:O75096 protein from Homo 
5 sapiens (Human) (MEGF7). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV25 is expressed in at least adrenal gland/suprarenal gland, bone marrow, brain, 
bronchus, brown adipose, cartilage, cervix, colon, heart, hypothalamus, lung, peripheral blood, 
pituitary gland, spinal chord, stomach, testis, thalamus, uterus. This information was derived 
10 by determining the tissue sources of the sequences that were included in the invention 

including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV25 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 25C. 
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Table 25C. BLAST results for NOV25 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi| 17224416 |gb|AAL 
36970 . 1 | 
(AF247637) 


LDLR dan [Mus 
musculus] 


1905 


1779/1849 
(96%) 


1809/1849 
(97%) , 


0.0 


gi|3449306|dbj | BAA 
32468. l| (AB011540 


MEGF7 [Homo 
sapiens] 


1576 


1572/1576 
(99%) 


1574/1576 
(99%) 


0.0 


gi | 6681362 | db j | BAA 
88688. l| 
(AB011533) 


MEGF7 [Rattus 
norvegicus] 


1298 


1248/1298 
(96%) 


1274/1298 
(98%) 


0.0 


gi|l7472590|ref | XP 

061753. 1| 
(XM 061753) 


similar to MEGF7 
(H. sapiens) 


1007 


992/992 
(100%) 


992/992 
(100%) 


0.0 
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qi | 14763921 | ref |XP 


low density 


859 


857/859 


859/859 


0 . 0 


035037. 1| 


lipoprotein 




(99%) 


(99%) 




(XM_035037) 


receptor- related 












protein 4 [Homo 












sapiens] 











Tables 25D-E list the domain descriptions from DOMAIN analysis results against 
NOV25. This indicates that the NOV25 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 25D. Domain Analysis of NOV25 

gnl I Smart | smart00192 , LDLa, Low-density lipoprotein receptor domain 
class A; Cysteine- rich repeat in the low-density lipoprotein (LDL) 
receptor that plays a central role in mammalian cholesterol 
metabolism. The N- terminal type A repeats in LDL receptor bind the 
lipoproteins. Other homologous domains occur in related receptors, 
including the very low-density lipoprotein receptor and the LDL 
receptor- related protein/alpha 2 -macroglobulin receptor, and in 
proteins which are functionally unrelated, such as the C9 component of 
complement. Mutations in the LDL receptor gene cause familial 
hypercholesterolemia . 

CD-Length = 38 residues, 97.4% aligned 
Score = 64.3 bits (155), Expect = 6e-ll 



Table 25E. Domain Analysis of NOV25 

gnl [ Smart [ smart00135 , LY, Low-density 1 ipoprotein- receptor YWTD 
domain; Type "B" repeats in low-density lipoprotein (LDL) receptor 
that plays a central role in mammalian cholesterol metabolism. Also 
present in a variety of molecules similar to gp3 0 0/megalin . 

CD-Length = 43 residues, 95.3% aligned 

Score = 62.0 bits (149), Expect = 3e-10 



1 0 The domain that characterizes epidermal growth factor (EGF) consists of 

approximately 50 amino acids, and has been shown to be present, in a more or less conserved 
form, in a large number of other, mostly animal proteins. EGF-like domains are believed to 
play a critical role in a number of extracellular events, including cell adhesion and receptor- 
ligand interactions. Proteins with EGF-like domains often consist of more than 1,000 amino 

15 acids, have multiple copies of the EGF-like domain, and contain additional domains known to 
be involved in specific protein-protein interactions. The list of proteins currently known to 
contain one or more copies of an EGF-like pattern is large and varied. The functional 
significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, 
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a common feature is that these repeats are found in the extracellular domain of membrane- 
bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). 
The EGF domain includes six cysteine residues which have been shown (in EGF) to be 
involved in 3 disulfide bonds. The main structure is a two-stranded beta-sheet followed by a 
5 loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines 
vary in length. 

To identify proteins containing EGF-like domains, Nakayama et al. (1998) searched a 
database of long cDNA sequences randomly selected from a human brain cDNA library for 
those that encode an EGF-like motif. They identified several partial cDNAs encoding novel 

10 proteins with EGF-like domains, such as LRP4, which they named MEGF7. The predicted 

partial LRP4 protein contains 2 EGF-like domains, a calcium binding-type EGF-like domain, 
3 LDL receptor-type EGF-like domains, 4 YWTD spacer regions, a transmembrane domain, a 
cytoplasmic NPXY motif, which is required for clustering and internalization of LDL 
receptors, and a cytoplasmic tSXV motif, which anchors proteins with a PDZ domain. The 

1 5 sequence and domain organization of LRP4 shows significant similarities to those of members 
of the LDL receptor family. Northern blot analysis detected rat Megf7 expression in several 
regions of the brain. Using a radiation hybrid mapping panel, Nakayama et al. (1998) mapped 
the LRP4 gene to llpl2-pll.2. 

The disclosed NOV25 nucleic acid of the invention encoding a MEGF7-like protein 

20 includes the nucleic acid whose sequence is provided in Table 25 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 25A while still encoding a protein that maintains 
its MEGF7-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 

25 described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 

30 in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 1 percent of the 
bases may be so changed. 
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The disclosed NOV25 protein of the invention includes the MEGF7-like protein whose 
sequence is provided in Table 25B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 25B 
while still encoding a protein that maintains its MEGF7-like activities and physiological 
5 functions, or a functional fragment thereof. In the mutant or variant protein, up to about l 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MEGF7-like 

1 0 protein (NOV25) may function as a member of a "MEGF7 family". Therefore, the NOV25 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 

1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV25 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

20 and disorders as indicated below. For example, a cDNA encoding the MEGF7-like protein 

(NOV25) may be useful in gene therapy, and the MEGF7-like protein (NOV25) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from adrenoleukodystrophy, congenital adrenal hyperplasia, hemophilia, hypercoagulation, 

25 idiopathic thrombocytopenic purpura, autoimmume disease, allergies, immunodeficiencies, 
transplantation, graft vesus host; diseases of the brain and nervous system, including Von 
Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, cerebral palsy, epilepsy, Lesch- 
Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral 

30 disorders, addiction, anxiety, pain, neuroprotection; diseases of the respiratory system, 
including systemic lupus erythematosus , autoimmune disease, asthma, emphysema, 
scleroderma, allergy, ARDS; diseases and disorders of adipose tissue, reproductive system, 
colon, circulatory system, spinal chord, digestive system, and endocrine system, or other 
pathologies or conditions. The NOV25 nucleic acid encoding the MEGF7-like protein of the 
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invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV25 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV25 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV25 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



A disclosed NOV26 nucleic acid of 635 nucleotides (also referred to as CG57456-01) 
encoding a COP-Coated Vesicle Membrane Protein P24 Precursor-like protein is shown in 
Table 26A. The start and stop codons are in bold letters. 



CCCCACTATGGTGACGCTCGCTGAGCTGCTGTTGCTCCAGAACACTCTCCTGACCATGGTCTTGGGCTAT 
TTCATCAGCATCCACGCACATGCTGAAGAATGCTTAAGTGAGCATGTCACCTCAGGCACCAAGATGGGCC 
TCATCTTCGAAGGTGGCTTCCTGGGCATCAACATGGAGATTACAGGACCTAAGAATAAAAGGATTTATAA 
AGGAGACAAAGAATCCAGTGGGAAATACACATTTTCTGCTCACATGGATGGAACAAATACATTTTGTTTT 
AGTGAC CGAGTGT CCAC CATG AC T C CAAAGAT AGTGAT AT T C AC C ATTGAT ATT GGGG AGG CT ACAAAAA 
G AG AAG AC ATGG AAAC AG AAG CT C ACCAG AAC AAAC T AG AAG AAATG ATC AG TGAG C TGG C TGTGGC C AT 
G AC AG C TG T AC AG CAC AAAG AGG AAT AC ACG AAAAT C TGGG AG AGG AT AC AC AG AG C C ATT AG TG ACAAC 
ACAAACAGCCCAGTGGTCCTTCGGTGCTTCTTTGAAGCTCTTGTTCTAATTGCCATGACATTGGGACACA 
TCTACTACCTGAAGAGATTTTTTGAAGTCCAGAGGGTTGTTTCAAAAGCCTCTTCCTGATGATTCCAAAC 
TCATA 



In a search of public sequence databases, the NOV26 nucleic acid sequence has 630 of 
635 bases (99%) identical to a gb:GENBANK-ID:AF152363|acc:AFl 52363.1 mRNA from 
Homo sapiens (Homo sapiens constitutive fragile region FRA3B sequence). Public nucleotide 
databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV26 polypeptide (SEQ ID NO:62) encoded by SEQ ID NO:61 has 
203 amino acid residues and is presented in Table 26B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV26 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
cleavage site for a NOV26 peptide is between amino acids 29 and 30. 



NOV26 



Table 26A. NOV26 nucleotide sequence (SEQ ID NO:61). 
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Table 26B. Encoded NOV26 protein sequence (SEQ ID NO:62). 



MVTLAELLLLQNTLLTMVLGYFI S IHAHAEECLSEHVTSGTKMGLI FEGGFLGINME ITG 
PKNKRIYKGDKESSGKYTFSAHMDGTNTFCFSDRVSTMTPKIVIFTIDIGEATKREDMET 
E AHQNKLEEM I S ELAVAMT AVQHKE E YTK I WER I HRAI S DNTNS PWLRCF FE AL VL I AM 
TLGHIYYLKRFFEVQRWSKASS 



A search of sequence databases reveals that the NOV26 amino acid sequence has 156 
of 201 amino acid residues (77%) identical to, and 175 of 201 amino acid residues (87%) 
similar to, the 201 amino acid residue ptnr:SWISSNEW-ACC:Ql 5363 protein from Homo 
5 sapiens (Human) (COP-COATED VESICLE MEMBRANE PROTEIN P24 PRECURSOR 
(P24A) (RNP24)). Public amino acid databases include the GenBank databases, SwissProt, 
PDB and PIR. 

NOV26 is expressed in at least Eye, placenta, colon, and ovary. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
10 including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV26 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 26C. 



Table 26C. BLAST results for NOV26 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


eill7440558|ref!XP 067504. 
U (XM__067504) 


similar to coated vesicle 
membrane protein 
[Homo 

sapiens] 


271 


186/192 
(96%) 


186/192 
(96%) 


e-100 


£df97900 1 5IreflNP 062744. 1 
|(NM__0 19770) 


coated vesicle membrane 
protein; Sid394p [Mus 
musculus] 


201 


155/201 
(77%) 


173/201 
(85%) , 


4-74 


fiiiJ352660|spiP49020|P24 
CRIGR 


Cop-coated 
vesicle membrane 
protein p24 precursor 


196 


148/196 
(75%) 


169/196 
(85%) , 


6-74 


eil5803149lreflNP 006806.1 
J(NM_006815) 


coated vesicle membrane 
protein [Homo sapiens] 


201 


156/201 
(77%) 


175/201 
(86%) , 


e-73 


gi| 13929014 |ref | NP 
113910 . 1 | 
(NM_031722 


coated vesicle 
membrane protein 
[Rattus 
norvegicus] 


201 


155/201 
(77%) 


174/201 
(86%) , 


e-73 
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Table 26D lists the domain descriptions from DOMAIN analysis results against 
NOV26. This indicates that the NOV26 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 26D. Domain Analysis of NOV26 

gnl 1 Pfamlpfam01105 , EMP24_GP25L, emp24/gp25L/p24 family. Members of 
this family are implicated in bringing cargo forward from the ER and 
binding to coat proteins by their cytoplasmic domains. 

CD-Length = 202 residues, 92.6% aligned 

Score = 135 bits (341) , Expect = 2e-33 



Members of the p24 family of putative cargo receptors are proposed to contain 
retrograde and anterograde trafficking signals in their cytoplasmic domain to facilitate coat 
5 protein binding and cycling in the secretory pathway. The localization and transit of the wild- 
type chimera from the endoplasmic reticulum (ER) through the Golgi complex involved a 
glutamic acid residue and a conserved glutamine in the TMD. The TMD glutamic acid 
mediated the localization of the chimeras to the ER in the absence of the conserved glutamine. 
Efficient ER exit required the TMD glutamine and was further facilitated by a pair of 
10 phenylalanine residues in the cytoplasmic tail. TMD residues of p24 proteins may mediate the 
interaction with integral membrane proteins of the vesicle budding machinery to ensure p24 
packaging into transport vesicles. 

Blum et al. (1996) identified a 21-kD rat pancreatic microsomal membrane protein that 
they designated Tmp21 . By probing a human brain cDNA library with a fragment of the rat 

1 5 sequence, they isolated a cDNA encoding human TMP2 1 . The deduced 2 1 9-amino acid type I 
intracellular transmembrane protein contains a signal sequence and is predicted to be located 
in the lumen of the endoplasmic reticulum. Northern blot analysis detected a 1 .4-kb TMP21 
transcript. Immunoblot analysis showed that the rat Tmp21 protein is expressed predominantly 
in the microsomal fraction of pancreatic acinar cells. Horer et al. (1999) determined that a 

20 putative TMP21 isoform, TMP21-II, is a neutral pseudogene. 

The disclosed NOV26 nucleic acid of the invention encoding a COP-Coated Vesicle 
Membrane Protein P24 Precursor-like protein includes the nucleic acid whose sequence is 
provided in Table 26A or a fragment thereof. The invention also includes a mutant or variant 
nucleic acid any of whose bases may be changed from the corresponding base shown in Table 

25 26A while still encoding a protein that maintains its COP-Coated Vesicle Membrane Protein 
P24 Precursor-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
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complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
5 used, for example, as anti sense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 1 percent of the 
bases may be so changed. 

The disclosed NOV26 protein of the invention includes the COP-Coated Vesicle 
Membrane Protein P24 Precursor-like protein whose sequence is provided in Table 26B. The 

1 0 invention also includes a mutant or variant protein any of whose residues may be changed 
from the corresponding residue shown in Table 26B while still encoding a protein that 
maintains its COP-Coated Vesicle Membrane Protein P24 Precursor-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 23 percent of the residues may be so changed. 

1 5 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(F a b)2 s that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this COP-Coated 
Vesicle Membrane Protein P24 Precursor-like protein (NOV26) may function as a member of 
a "COP-Coated Vesicle Membrane Protein P24 Precursor family". Therefore, the NOV26 

20 nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

25 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV26 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the COP-Coated Vesicle 

30 Membrane Protein P24 Precursor-like protein (NOV26) may be useful in gene therapy, and 
the COP-Coated Vesicle Membrane Protein P24 Precursor-like protein (NOV26) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from Endometriosis,Fertility, Von Hippel-Lindau (VHL) syndrome , Diabetes, Tuberous 
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sclerosis, or other pathologies or conditions. The NOV26 nucleic acid encoding the COP- 
Coated Vesicle Membrane Protein P24 Precursor-like protein of the invention, or fragments 
thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV26 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV26 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV 16 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV27 

A disclosed NOV27 nucleic acid of 1 120 nucleotides (also referred to as CG57658-01) 
encoding a connexin-like protein is shown in Table 27A. The start and stop codons are in bold 
letters. 



Table 27A. NOV27 nucleotide sequence (SEQ ID NO:63). 

GAGGCCATGCCCGCTTCCTCTCTTCCAGGAAAGCTCTGGTTCGTCCTCACGATGCTGCTGCGGATGCTGG 
TGATTGTCTTGGCGGGGCGACCCGTCTACCAGGACGAGCAGGAGAGGTTTGTCTGCAACACGCTGCAGCC 
GGGATGCGCCAATGTTTGCTACGACGTCTTCTCCCCCGTGTCTCACCTGCGGTTCTGGCTGATCCAGGGC 
GTGTGCGTCCTCCTCCCCTCCGCCGTCTTCAGCGTCTATGTCCTGCACCGAGGAGCCACGCTCGCCGCGC 
TGGGCCCCCGCCGCTGCCCCGACCCCCGGGAGCCGGCCTCCGGGCAGAGACGCTGCCCGCGGCCATTCGG 
GGAGCGCGGCGGCCTCCAGGTGCCCGACTTTTCGGCCGGCTACATCATCCACCTCCTCCTCCGGACCCTG 
CTGGAGGCAGCCTTCGGGGCCTTGCACTACTTTCTCTTTGGATTCCTGGCCCCGAAGAAGTTCCCTTGCA 
CGCGCCCTCCGTGCACGGGCGTGGTGGACTGCTACGTGTCGCGGCCCACAGAGAAGTCCCTGCTGATGCT 
GTTCCTCTGGGCGGTCAGCGCGCTGTCTTTTCTGCTGGGCCTCGCCGACCTGGTCTGCAGCCTGCGGCGG 
CGGATGCGCAGGAGGCCGGGACCCCCCACAAGCCCCTCCATCCGGAAGCAGAGCGGAGCCTCAGGCCACG 
CGGAGGGACGCCGGACTGACGAGGAGGGTGGGCGGGAGGAAGAGGGGGCACCGGCGCCCCCGGGTGCACG 
CGCCGGAGGGGAGGGGGCTGGCAGCCCCAGGCGTACATCCAGGGTGTCAGGGCACACGAAGATTCCGGAT 
G AGG ATGAGAG TGAGGTG ACAT C CT C CG C CAG CG AAAAG C TGGG CAG ACAG C C C CGGGG CAGG C C C C AC C 
GAGAGG C CG C C CAGGAC C C CAGGGG CT CAGGAT C CG AGG AG CAG C C C T CAG C AG C C C C CAG CCGCCTGGC 
CGCGCCCCCTTCCTGCAGCAGCCTGCAGCCCCCTGACCCGCCTGCCAGCTCCAGTGGTGCTCCCCACCTG 
AGAGCCAGGAAGTCTGAGTGGGTGTGAAAAAAACAGCACCTGGCGGTGCCCCGGGGCTCACGCCTGTAAT 



In a search of public sequence databases, the NOV27 nucleic acid sequence, located on 
chromsome 10 has 1037 of 1097 bases (94%) identical to a gb:GENBANK- 
ID:AB046017|acc:AB046017.1 mRNA from Macaca fascicularis (Macaca fascicularis brain 



"ii sr% ir'.& cs, 

mJSLi tls*JE» 




167 



nm & jet- -f f|o 
-u8« ^ii 1 ^-ji Jin^i "Sm-j 5 



1 slsj p.^p * 



cDNA, clone:QccE-15512). Public nucleotide databases include all GenBank databases and 
the GeneSeq patent database. 

The disclosed NOV27 polypeptide (SEQ ID NO:64) encoded by SEQ ID NO:63 has 
356 amino acid residues and is presented in Table 27B using the one-letter amino acid code. 
5 Signal P, Psort and/or Hydropathy results predict that NOV27 has a signal peptide and is 
likely to be localized in the plasma membranewith a certainty of 0.6400. The most likely 
cleavage site for a NOV27 peptide is between amino acids 26 and 27 . 



Table 27B. Encoded NOV27 protein sequence (SEQ ID NO:64). 

MPASSLPGKLWFVLTMLLRMLVIVLAGRPVYQDEQERFVCNTLQPGCANVCYDVFSPVSH 
LRFWLIQGVCVLLPSAVFSVYVLHRGATLAALGPRRCPDPREPASGQRRCPRPFGERGGL 
QVPDFSAGYIIHLLLRTLLEAAFGALHYFLFGFLAPKKFPCTRPPCTGWDCYVSRPTEK 
SLLMLFLWAVSALSFLLGLADLVCSLRRRMRRRPGPPTSPSIRKQSGASGHAEGRRTDEE 
GGREEEGAPAPPGARAGGEGAGSPRRTSRVSGHTKIPDEDESEVTSSASEKLGRQPRGRP 
HREAAQDFRGSGSEEQPSAAPSRLAAPPSCSSLQPPDPPASSSGAPHLRARKSEWV 



A search of sequence databases reveals that the NOV27 amino acid sequence has 
10 348/348 (100%) identical to TREMBLNEW-ACC:CAC10186 BA425A6.2 (SIMILAR TO 
CONNEXIN) - Homo sapiens. Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV27 is expressed in at least Brain, Lung, Ovary, colon. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
15 including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV27 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 27C. 



Table 27C. BLAST results for NOV27 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l7489790|ref j XP 


similar to bA425A6.2 
(similar to connexin) 
[Homo 

sapiensl 


370 


349/352 
(99%) 


351/352 
(99%) 


e-167 


058368. l| 
(XM_058368) 


gi | 103 34 641 | emb | CAC 


bA425A6.2 (similar to 
connexin) [Homo sapiens] 


348 


348/348 
(100%) 


348/348 
(100%) 


e-163 


10186.11 (AL121749) 


gi| 9280090 | db j | BAB0 


unnamed protein product 
[Macaca fascicularis] 


341 


316/341 
(92%) 


323/341 
(94%) 


e-159 


1599. l| (AB046017) 


gi | 17489782 | ref | XP 


similar to connexin) 
[Homo sapiens] 


341 


341/341 
(100%) 


341/341 
(100%) 


e-158 


061277.1] 
(XM 061277) 
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gi | 15990849 | emb | CAC 


connexin39 [Mus 


364 


175/372 


215/372 


6e-67 


93844 . 1 | (AJ414562 


musculus] 




(47%) 


(57%) , 





Tables 27D-E list the domain descriptions from DOMAIN analysis results against 
NOV27. This indicates that the NOV27 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 27D. Domain Analysis of NOV27 

gnl | Pfaui| pf am00029 , connexin, Connexin . 

CD-Length = 218 residues, 91.7% aligned 

Score - 172 bits (436), Expect = 3e-44 



Table 27E. Domain Analysis of NOV27 

gnl | Smart | smart00037 , CNX/ Connexin homologues; Connexin channels 
participate in the regulation of signaling between developing and 
differentiated cell types. 

CD-Length = 34 residues, 97.1% aligned 

Score = 64.7 bits (156), Expect = 9e-12 



10 Gap junctions were first characterized by electron microscopy as regionally specialized 

structures on plasma membranes of contacting adherent cells. These structures were shown to 
consist of cell-to-cell channels. Proteins, called connexins, purified from fractions of enriched 
gap junctions from different tissues differ. The connexins are designated by their molecular 
mass. Another system of nomenclature divides gap junction proteins into 2 categories, alpha 

15 and beta, according to sequence similarities at the nucleotide and amino acid levels. For 
example, CX43 is designated alpha-1 gap junction protein, whereas CX32 and CX26 are 
called beta-1 and beta-2 gap junction proteins, respectively. This nomenclature emphasizes 
that CX32 and CX26 are more homologous to each other than either of them is to CX43. The 
connexins are a family of integral membrane proteins that oligomerise to form intercellular 

20 channels that are clustered at gap junctions. These channels are specialised sites of cell-cell 

contact that allow the passage of ions, intracellular metabolites and messenger molecules (with 
molecular weight <l-2 kD) from the cytoplasm of one cell to its apposing neighbours. They 
are found in almost all vertebrate cell types, and somewhat similar proteins have been cloned 
from plant species. Invertebrates utilise a different family of molecules, innexins, that share a 

25 similar predicted secondary structure to the vertebrate connexins, but have no sequence 
identity to them. Vertebrate gap junction channels are thought to participate in diverse 
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biological functions. For instance, in the heart they permit the rapid cell-cell transfer of action 
potentials, ensuring coordinated contraction of the cardiomyocytes. They are also responsible 
for neurotransmission at specialised 'electrical' synapses. In non-excitable tissues, such as the 
liver, they may allow metabolic cooperation between cells. In the brain, glial cells are 
5 extensively-coupled by gap junctions; this allows waves of intracellular Ca2+ to propagate 
through nervous tissue, and may contribute to their ability to spatially-buffer local changes in 
extracellular K+ concentration. The connexin protein family is encoded by at least 1 3 genes in 
rodents, with many homologues cloned from other species. They show overlapping tissue 
expression patterns, most tissues expressing more than one connexin type. Their conductances, 

10 permeability to different molecules, phosphorylation and voltage-dependence of their gating, 
have been found to vary. Possible communication diversity is increased further by the fact that 
gap junctions may be formed by the association of different connexin isoforms from apposing 
cells. However, in vitro studies have shown that not all possible combinations of connexins 
produce active channels. Hydropathy analysis predicts that all cloned connexins share a 

15 common transmembrane (TM) topology. Each connexin is thought to contain 4 TM domains, 
with two extracellular and three cytoplasmic regions. This model has been validated for 
several of the family members by in vitro biochemical analysis. Both N- and C-termini are 
thought to face the cytoplasm, and the third TM domain has an amphipathic character, 
suggesting that it contributes to the lining of the formed-channel. Amino acid sequence 

20 identity between the isoforms is -50-80%, with the TM domains being well conserved. Both 
extracellular loops contain characteristically conserved cysteine residues, which likely form 
intramolecular disulphide bonds. By contrast, the single putative intracellular loop (between 
TM domains 2 and 3) and the cytoplasmic C-terminus are highly variable among the family 
members. Six connexins are thought to associate to form a hemi-channel, or connexon. Two 

25 connexons then interact (likely via the extracellular loops of their connexins) to form the 
complete gap junction channel. 

The disclosed NOV27 nucleic acid of the invention encoding a connexin-like protein 
includes the nucleic acid whose sequence is provided in Table 27 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
30 from the corresponding base shown in Table 27 A while still encoding a protein that maintains 
its connexin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
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complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
5 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 6 percent of the 
bases may be so changed. 

The disclosed NOV27 protein of the invention includes the connexin-like protein 
whose sequence is provided in Table 27B. The invention also includes a mutant or variant 

10 protein any of whose residues may be changed from the corresponding residue shown in Table 
B while still encoding a protein that maintains its connexin-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 0 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

1 5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this connexin-like 
protein (NOV27) may function as a member of a "connexin family". Therefore, the NOV27 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

20 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

25 The NOV27 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the connexin-like protein 
(NOV27) may be useful in gene therapy, and the connexin-like protein (NOV27) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 

30 compositions of the present invention will have efficacy for treatment of patients suffering 

from Cardiomyopathy, Atherosclerosis,Hypertension, Congenital heart defects, Aortic stenosis 
,Atrial septal defect (ASD),Atrioventricular (A-V) canal defect, Ductus arteriosus , Pulmonary 
stenosis , Subaortic stenosis, Ventricular septal defect (VSD), valve diseases,Tuberous 
sclerosis, Scleroderma, Obesity,Transplantation, Diabetes, Von Hippel-Lindau (VHL) 
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syndrome , Pancreatitis,Obesity, Endometriosis,Fertility, Hemophilia, 
Hypercoagulationjdiopathic thrombocytopenic purpura , Immunodeficiencies,Graft vesus 
host, Autoimmune disease, Renal artery stenosis, Interstitial nephritis, Glomerulonephritis, 
Polycystic kidney disease, Systemic lupus erythematosus, Renal tubular acidosis, IgA 
nephropathy, Hypercalcemia, Lesch-Nyhan syndrome, Von Hippel-Lindau (VHL) syndrome , 
Alzheimer's disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's disease, 
Huntington's disease, Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, Multiple 
sclerosis,Ataxia-telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, Anxiety, 
Pain, Neuroprotection, or other pathologies or conditions. The NOV27 nucleic acid encoding 
the connexin-like protein of the invention, or fragments thereof, may further be useful in 
diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are 
to be assessed. 

NOV27 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV27 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV27 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV28 

A disclosed NOV28 nucleic acid of 1234 nucleotides (also referred to as CG57662-01) 
encoding a -like protein is shown in Table 28A. The start and stop codons are in bold letters. 



Table 28A. NOV28 nucleotide sequence (SEQ ID NO:65). 



TAATAATCTTTTTTTAAAACTCCCTAACAGGATGTGTGGCAGGTTCCTGAGGTGGTGGCTGCTGGCGGAG 
GAGAGCTGGCACTCCACCCCCGTGGGGCGCCTCCTGTTTCCCGTGCTCCTGGGATTCCGCCTTGTGCTGC 
TGGCTGCCAGTGGGCCTGGAGTCTATGGCGATGAGCAGAGTGAATTCGTGTGTCACACCCAGCAGCCGGG 
CTGCAAGGCTGCCTGCTTCGATGCCTTCCACCCGCTCTCCCCGCTGCGTTTCTGGGTCTTCCAGGTCATC 
TTGGTGGCTGTACCTAGCGTCCTCTACATGGGTTTCACTCTGTATCACGTGATCTGGCACTGGGAAGAAT 
CAAGAAAGGGGACGGAGGAAGAGGACACCCTGATCCAGGGAGGGGAGAGCAGCAGAGATACCCCAGGGGC 
TGGAAGCCTCAGGCTGCTCCGAGCTTATGTGGCTCAGCTGGGAGCTCAGCTGGTCCTGGAGGGGACAGCG 
CCGGGGTTGCAGTACCACCTGTATGGGTTCCAGATGCCCAGCTCCTTTGCATGTGGCCAAGAGCCTTGCC 
CGTATAGATTAACTTGCACCTTTTCCCACCCCTCGGAGAAGATCATCTTTCTAAAAGCCATGTTTGGGGT 
CAGTGGGTTCCGTCTCTTGTTCACTCTTTTGGAGATTGTGCTTCTGGGTCTGGGAAGACTGTGTAAGCCC 
CTGCGGAACTTCCTGGGTGGGGCCTCTTCCTCCAGCCACGCCCTGGCCCTGAGCAGCAAAAGGAACCTCC 
AGCAGACACTGGGAGCCATCCATCGGCCTGGTCAGCCTTGTTCCATTTCAGAGACCATGTTCCCCACAGC 
CCCAGTGACTAGGGGTGACATCTCCCGACCTCCCCCACCTGTGGATATGGCCAAGTCGAGGTACCGGTTA 
AC CAAAGATGCTGAAGGAGTGAAGAAC CAGCCATCCC CTAATACG CAGGATGGTTATATTGATTATGTCA 
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AACTGAAAACTTTGGAGAAACTCCTCTCTCAGAAAGCGATAACTGGGCCAGACACGGTGGCTCATGCCTG 
T AAT C C CAG C ATTT TGGGAGG C CT AGG CAGGTGGATC ACTGG AGGT C AGGAGT T C AAG AC CAG C C AGG C C 
AACATGGTGAAACCCGTGTCTACTAAAACTACAAAAATTCTGGGCATGGTGGTGGGCGTCTGTAATCCCA 
GCTACTTGAGAGGCTGAGGCAGGAGAATTGCTTGAACCTGGGAG 



In a search of public sequence databases, the NOV28 nucleic acid sequence, located on 
chromsome 7 has 206 of 244 bases (84%) identical to a gb:GENBANK- 
ID:AP000692|acc:AP000692.1 mRNA from Homo sapiens (Homo sapiens genomic DNA, 
chromosome 21q22.2, PAC clone:24J14, CBR1-HLCS region). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV28 polypeptide (SEQ ID NO: 66) encoded by SEQ ID NO: 65 has 
391 amino acid residues and is presented in Table 28B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV28 has a signal peptide and is 
likely to be localized in the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV28 peptide is between amino acids 46 and 47. 

Table 28B. Encoded NOV28 protein sequence (SEQ ID NO:66). 

MCGRFLRWWLLAEESWHSTPVGRLLFPVLLGFRLVLLAASGPGVYGDEQSEFVCHTQQPG 
CKAACFDAFHPLSPLRFWVFQVILVAVPSVLYMGFTLYHVIWHWEESRKGTEEEDTLIQG 
GESSRDTPGAGSLRLLRAYVAQLGAQLVLEGTAPGLQYHLYGFQMPSSFACGQEPCPYRL 
TCTFSHPSEKIIFLKAMFGVSGFRLLFTLLEIVLLGLGRLCKPLRNFLGGASSSSHALAL 
SSKRNLQQTLGAIHRPGQPCSISETMFPTAPVTRGDISRPPPPVDMAKSRYRLTKDAEGV 
KNQPS PNTQDGYIDYVKLKTLEKLLSQKAI TGPDTVAHACNPS I LGGLGRWI TGGQEFKT 
SQANMVKPVSTKTTKILGMWGVCNPSYLRG 



A search of sequence databases reveals that the NOV28 amino acid sequence has 
108/108 (100%) amino acids identical with ptnr:SPTREMBL-ACC:O60387 
WUGSC:H_DJ0604G05.3 PROTEIN - Homo sapiens (Human). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV28 is expressed in at least Brain, Breast, Colon, Gall bladder, Germ Cell, Heart, 
Kidney, Liver, Ovary, Pancreas, Prostate, Stomach, Testis, Whole embryo, brain, breast, 
breast normal, colon, colon ins, head_neck, lung, nervous_tumor, prostate, prostate normal, 
prostate_tumor, stomach. This information was derived by determining the tissue sources of 
the sequences that were included in the invention including but not limited to SeqCalling 
sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV28 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 28C . 
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Table 28C. BLAST results for NOV28 



Gene Index/ 
Identifier 



Protein/ Organism 



Length 
(aa) 



Identity 
(%) 



Po 
si tives 
{%) 



Expect 



gi | 3 00623 0 |gb[ AAC09 
485 . 1 1 (AC004522 



gap junction 

protein; similar 

to P36383 
(PID:g544117) 
[Homo sapiens] 



207 



207/250 
(82%) 



207/250 
(82%) 



e-103 



gi[l7978264|re£|NP 
536698 .1 | 
(NM 080450) 



gap junction 
membrane channel 
protein epsilon 
1; connexin 2 9 
[Mus mus cuius] 



258 



135/219 
(61%) 



167/219 
(75%) 



2e-75 



gi| 18566654 |ref | XP 
095131. l| 
(XM 095131) 



hypothetical 
protein XP_095131 
[Homo sapiens] 



222 



100/115 
(86%) 



103/115 
(88%) 



3e-51 



qil6680007|re£|NP 0 
32148 .l| 
(NM 008122) 



gap junction 
membrane channel 
protein alpha 7; 
connexin 4 5 [Mus 
musculus] 



396 



96/250 
(38%) 



136/250 
(54%) 



5e-41 



qil4885169lref |NP 0 
05488 . 1 | 
(NM 005497) 



gap junctxon 
protein, alpha 7, 
4 5kD (connexin 
45) [Homo 
sapiens] 



396 



96/250 
(38%) 



136/250 
(54%) 



8e-41 



Table 28D lists the domain descriptions from DOMAIN analysis results against 
NOV28. This indicates that the NOV28 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 28D. Domain Analysis of NOV28 



gnl | Pfam | pf am00029 , connexin, Connexin. 

CD-Length = 218 residues, 90.4% aligned 



Score 



173 bits (438) , Expect = 2e-44 



The disclosed NOV28 nucleic acid of the invention encoding a connexin-like protein 
includes the nucleic acid whose sequence is provided in Table 28 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 

10 from the corresponding base shown in Table 28 A while still encoding a protein that maintains 
its connexin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

15 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
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phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 16 percent of the 
5 bases may be so changed. 

The disclosed NOV28 protein of the invention includes the connexin-like protein 
whose sequence is provided in Table 28B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
28B while still encoding a protein that maintains its connexin-like activities and physiological 
10 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 0 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this connexin-like 
1 5 protein (NOV28) may function as a member of a "connexin family". Therefore, the NOV28 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
20 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV28 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
25 and disorders as indicated below. For example, a cDNA encoding the connexin-like protein 
(NOV28) may be useful in gene therapy, and the connexin-like protein (NOV28) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from Cardiomyopathy, Atherosclerosis,Hypertension, Congenital heart defects, Aortic stenosis 
30 ,Atrial septal defect (ASD),Atrioventricular (A-V) canal defect, Ductus arteriosus , Pulmonary 
stenosis , Subaortic stenosis, Ventricular septal defect (VSD), valve diseases,Tuberous 
sclerosis, Scleroderma, Obesity,Transplantation, Diabetes,Von Hippel-Lindau (VHL) 
syndrome , Pancreatitis,Obesity, Endometriosis ? Fertility, Hemophilia, 
Hypercoagulationjdiopathic thrombocytopenic purpura , Immunodeficiencies,Graft vesus 
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host, Autoimmune disease, Renal artery stenosis, Interstitial nephritis, Glomerulonephritis, 
Polycystic kidney disease, Systemic lupus erythematosus, Renal tubular acidosis, IgA 
nephropathy, Hypercalcemia, Lesch-Nyhan syndrome, or other pathologies or conditions. The 
NOV28 nucleic acid encoding the connexin-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV28 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV28 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV28 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV29 

A disclosed NOV29 nucleic acid of 1400 nucleotides (also referred to as CG57664-01) 
encoding a MHC Class I antigen-like protein is shown in Table 29A. 



Table 29A. NOV29 nucleotide sequence (SEQ ID NO:67). 

ATTCTCCCCAAACGCCAGGGATGGGGGTCATGGCTCCCCGAACCCTCCTCCTGCTGCTCTTGGGGGCCCT 
GGCCCTGACCGAGACCTGGGCCGGTGAGTGCGGGGTCGGGAGGGAAAGGGCCTCTGCGGGGAGAAGCGAG 
TGGCCCGCCCGGCCCGGGGAGCCGCGCCTCAGCCTCTCCTCGCCTCCAGGCTCCCACTCCTTGAGGTATT 
TCAGCACCGCAGTGTCCCAGCCCGGCCGCGGGGAGCCCCGGTTCATCGCCGTGGGCTACGTGGACGACAC 
AGAGTTCGTGCGGTTCGACAGCGACTCCGTGAGTCCGAGGATGGAGCGGCGGGCGCCGTGGGTGGAGCAG 
GAGGGGCTGGAGTATTGGGACCAGGAGACACGGAACGCCAAGGGCCACGCGCAGATTTACCGAGTGAACC 
TG CGGAC C CTGCT C CG CT ATTACAAC CAGAG CGAGG C CGGTGGTT CT CA C ACCAT CCAGAGGAAG CATGA 
CTGCGACGTGGGCCCGACAGGCGGGCCCGACAGGCGCCTCCTCCGCAGGTATGAACAGTTCGCCTACGAT 
GGCAAGGATTACATCGCCCTGAACGAGGACCTGCCCTCCTGGACCGCCGCGAACACAGCGGCTCAGATCT 
CCCAGCACAAGTGGGAAGCGGACAAATACTCAGAGCAGGTCAGGGCCTACCTGAGGGCAAGTGCATGGAG 
TGGCGAGGGCAAGTGCATGGAGTGGCTCCGCAGACACCTGGAGAACGGGAAGGAGACGCTGCAGCGCGCG 
TCAGATCCCCCAAAGGCACATGTGACCCAGCACCCCGTCTCTGACCATGAGGCCACCCTTGAGGTGCTGG 
GCCCTGGGCCTCTACCCTTGAGGTGCTGGGCCCTGGGCCTCTACCCTGCGGAGATCACACTGACCTGGCA 
GCAGGATGGGGAGGACCAGACCCAGGACACGGAGCTTGTGGAGACCAGGCCTGCAGGGGACGGAACCTTC 
CAGAAG TGGGTGG CTG TAG TGG TG C C TTC CGGAGAGGAG CAGAGATA CATGTG C CATG TG CAG CATGAGG 
GGCTGCCAGAGCCCCTCACCCTGAGATGGCCCTCACCTCCCTCTCCTTTCCCAGAGCCGTCTTCTCAGCC 
CACCATCCCCATCGTGGGCATCGTTGCTGGCCTGTTTCTCCTTGGAGCTGTGGTCACTGGAGCTGTGGTT 
GCTGCTGTGATGAAGAGGAAGAAAAGCTCAGGTAGGGAAGGGGTGAGAGGTGGGATCTGGGTTTTCTTGT 
TCCACTGTGGGTTTCAAGCCACAGGTAGAATTGTGACTTGCTTCATCACTGGGAAGCACCGTCCACACAC 
AGGCCGACCTAGCCTGGGGCCCTGTGTGCCAACACTTGCTCTTTTGTGAAGCACATGTGAAAACGAAGGA 
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In a search of public sequence databases, the NOV29 nucleic acid sequence, located on 
chromsome 6 has 332 of 353 bases (94%) identical to a gb:GENBANK- 
ID:HUMHLA92|acc:M96338.1 mRNA from Homo sapiens (Homo sapiens HLA-92 gene 
sequence). Public nucleotide databases include all GenBank databases and the GeneSeq 
5 patent database. 

The disclosed NOV29 polypeptide (SEQ ID NO:68) encoded by SEQ ID NO:67 has 
452 amino acid residues and is presented in Table 29B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV29 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
10 cleavage site for a NOV29 peptide is between amino acids 24 and 25. 

Table 29B. Encoded NOV29 protein sequence (SEQ ID NO:68). 

LRYFSTAVSQPGRGEPRFIAVGYVDDTEFVRFDSDSVS PRMERRAPWVEQEGLEYWDQET 
RNAKGHAQIYRVNLRTLLRYYNQSEAGGSHTIQRKHDCDVGPTGGPDRRLLRRYEQFAYD 
GKD YI ALNEDLPS WTAANTAAQ I S QHKWE ADKYS EQVRAYLRAS AWSGEGKCMEWLRRHL 
ENGKETLQRASDPPKAHVTQHPVSDHEATLEVLGPGPLPLRCWALGLYPAEITLTWQQDG 
EDQTQDTELVETRPAGDGTFQKWVAVWPSGEEQRYMCHVQHEGLPEPLTLRWPSPPSPF 
PE PS S QPT I P I VG I VAGLFLLGAWTGAWAAVMKRKKS S GREGVRGG I WVFLFHCGFQA 
TGRIVTCFITGKHRPHTGRPSLGPCVPTLtALL 

A search of sequence databases reveals that the NOV29 amino acid sequence has 
158/223 (70%) identity and 177/223 (79%) similarity with SPTREMBL-ACC:Q9TPL2 MHC 
CLASS I ANTIGEN - Pan troglodytes (Chimpanzee). Public amino acid databases include the 
1 5 GenBank databases, SwissProt, PDB and PIR. 

NOV29 is expressed in at least Bone Marrow, Dermis, Hippocampus, Placenta, and 
Tonsils. Expression information was derived from the tissue sources of the sequences that 
were included in the derivation of the sequence of CG57664-01 .The sequence is predicted to 
be expressed in the following tissues because of the expression pattern of (GENBANK-ID: 
20 gb : GENBANK-ID : HUMHL A92 |acc : M963 38.1) a closely related Homo sapiens HLA-92 gene 
sequence homolog in species Homo sapiens :Lymphoblastoid cell line. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

25 The disclosed NOV29 polypeptide has homology to the amino acid sequences shown 

in the BLASTP data listed in Table 29C. 
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Table 29C. BLAST results for NOV29 


UCIIC 1X1UCA/ 

Identifier 


Protein/ Organism 


Length 
(aa) 


Ident i ty 
(%) 


Positives 
(%) 


Expect 


gi| 8117819 | qb | AAF7 2 
785 . 1 |AF168404 1 
(AF168404) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


275/404 
(68%) 


300/404 
(74%) , 


e-141 


gi | 8117799 | gb | AAF72 
776 . 1 | AF168395 1 
(AF168395) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


276/404 
(68%) 


301/404 
(74%) 


e-141 


gi | 2118771 |pir | | 154 
493 


MHC class I 
histocompatibilit 
y antigen HLA-A 
alpha chain 
precursor human 


365 


271/404 
(67%) 


299/404 
(73%) , 


e-141 


gi | 8117808 | gb | AAF72 
780 . 1 | AF168399 1 
(AF168399) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


275/404 
(68%) 


300/404 
(74%) , 


e-140 


gi I 22 5114 6 | emb | CAA6 
5501. 1| (X96724) 


human leukocyte 
antigen [Homo 
sapiens] 


365 


270/404 
(66%) 


300/404 
(73%) , 


e-140 



Tables 29D-E list the domain descriptions from DOMAIN analysis results against 
NOV29. This indicates that the NOV29 sequence has properties similar to those of other 
proteins known to contain this domain. 

5 

Table 29D. Domain Analysis of NOV29 

gnl | Pfamlpfam00129 , MHC__I , Class I Histocompatibility antigen, domains 
alpha 1 and 2 

CD-Length = 179 residues, 96.6% aligned 

Score = 248 bits (634), Expect = 4e-67 



Table 29E. Domain Analysis of NOV29 

gnl | Smart 1 smart00407 , IGcl, Immunoglobulin C-Type 

CD-Length = 75 residues, 96.0% aligned 

Score = 59.7 bits (143), Expect = 4e-10 



The major histocompatibility complex (MHC) encodes the class I and class II families 
1 0 of glycoproteins that present peptides for immunorecognition by cytotoxic and helper T 
lymphocytes, respectively. Class I molecules bind peptides generated by degradation of 
proteins intracellular^, whereas class II molecules associate mainly with peptides derived 
from endocytosed extracellular proteins. Two genes encode components of the proteasome 
complex, which degrades cytosolic proteins and may generate antigenic peptides. Two closely 
15 linked genes, PSF1 and PSF2, encode subunits of a transporter, which presumably translocates 
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peptides into an exocytic compartment where they associate with class I molecules. The 
location of these genes in the MHC in close linkage to the class I and class II gene families 
suggests that they coevolved to optimize functional interactions. 

The disclosed NOV29 nucleic acid of the invention encoding a MHC Class I antigen- 
like protein includes the nucleic acid whose sequence is provided in Table 29 A or a fragment 
thereof The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 29 A while still encoding a protein 
that maintains its MHC Class I antigen-like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 6 percent of the bases may be so changed. 

The disclosed NOV29 protein of the invention includes the MHC Class I antigen-like 
protein whose sequence is provided in Table 29B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 29B while still encoding a protein that maintains its MHC Class I antigen-like 
activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 30 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MHC Class I 
antigen-like protein (NOV29) may function as a member of a "MHC Class I antigen family". 
Therefore, the NOV29 nucleic acids and proteins identified here may be useful in potential 
therapeutic applications implicated in (but not limited to) various pathologies and disorders as 
indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 
diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 
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therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of 
all tissues and cell types composing (but not limited to) those defined here. 

The NOV29 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
5 and disorders as indicated below. For example, a cDNA encoding the MHC Class I antigen- 
like protein (NOV29) may be useful in gene therapy, and the MHC Class I antigen-like protein 
(NOV29) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from Von Hippel-Lindau (VHL) syndrome , Alzheimer's 

10 disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, Multiple sclerosis,Ataxia- 
telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, Anxiety, Pain, 
Neuroprotection, Tonsilitis, Hemophilia, hypercoagulationjdiopathic thrombocytopenic 
purpura, autoimmume disease,allergies, immunodeficiencies,transplantation, Graft vesus host, 

1 5 or other pathologies or conditions. The NOV29 nucleic acid encoding the MHC Class I 

antigen-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV29 nucleic acids and polypeptides are further useful in the generation of 
20 antibodies that bind immuno-specifically to the novel NOV29 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV29 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
25 assay systems for functional analysis of various human disorders, which will help in 

understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV30 

A disclosed NOV30 nucleic acid of 1225 nucleotides (also referred to as CG57666-01) 
30 encoding a MHC Class I antigen-like protein is shown in Table 3 OA. The start and stop 
codons are in bold letters. 



180 



hsSu *La<3! %*3? 3Li 



-a** fij Si^f lUuu *Lm~ *JyJJ :Qii»u 



Table 30A. NOV30 nucleotide sequence (SEQ ID NO:69). 



ACGCCGAGGATGGGGTCATGGCGTCCCAAACCCTCCTCCTGCTGCTCTTGGGGGCCCTGGCCCTGACCGA 
GACCTGGGCGGGTACCCACTCCATAAGGTATTTCAGCACCGCCGTGTCCCGGCCGGGTCGCGGGGAGC.ee 
CGGGGTACCCACTCCATAAGGTATTTCAGCACCGCCGTGTCCCGGCCGGGTCGCGGGGAGCCCCGGTACA 
TCGCAGTGGGCTACGTGGACGACACGCAGTTCGTGCGGTTCGACAGCGACGCGGCGACTCCGAGGATGGA 
GCCGCAGGCGCCGTGGTTGGAGCAGGAGGGACCGGAGTATTGGGACCGGAGCACACCGAACATCAGGCCC 
GCGCACAGACTGACAAGAGTGAACCTGCCCATGCCGCGCCGCTACTACCACCAGAGCGGGTCTAACACCC 
TCCAGATAATGTATGGCTGCGACTTGGGGCTGGAAGGGCGCCTCCTCCGCGGGTATGAACAGCACGCCAA 
CGATGGCAAAGATTACATCGCCCTAAACGAGGACCTGAGCTCTTGGACCGCGGCGGCCATGGCGGCTCAG 
AT T ACCCAG CGC AAGTGGGAGG CGGC C C ATGAGG CGGAGC AG CAGAGAGC CT AC CTGGAGGGCACGTGCG 
TGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACTACCCCCCCCCCCAAGAC 
ACATATGATCCACCATTCCGTCTCTGACTATAAGGCCACCCTGAGATGCTGGGCCCTGGGCTTCTACCCT 
G TGGAG AT CACAC TG AC CTGG CAGCAGGATGG AGAGGAC CAG AC T CAGGACATGG AG CTTGTAGAGAC CA 
GG C CTG C AGGGG ATGG AAACTT CC AGAAGTGGG C AGCTGTGGTGGTG C CTT CTGGAGAGGAACAGAGAT A 
CATGTGCCATGTGCAGCATGAGGGGTTGCCCAAGCCCCTCACCCTGAGATGGGAGCAGTCTTCTCAGCCC 
ACCATCCCCATCGTGGGTATCGTTGCTGGCCTGGTTCTCCTTGGAGCTGTAGTCACTGGAGCTGTGGTTT 
CTGCTGTGATGTGCAGGAAGAACTCATTTTGTTCTACCCCAGGCAGCAACCATGCGCAGGGTTCTGATGT 
GTCTCTCACGGCTTGTAAAGGTGAGACGCTGGGGGACCTGATGTGTGGGGGGTGTTGGGGGCAATAGTGG 
ATGCAGCTGTGCTATGGGGTTTCTTTGAATTGGAT 



10 



In a search of public sequence databases, the NOV30 nucleic acid sequence, located on 
chromsome 6 has 265 of 271 bases (97%) identical to a gbrGENBANK- 
ID:AF055066|acc:AF055066.1 mRNA from Homo sapiens (Homo sapiens MHC class 1 
region). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV30 polypeptide (SEQ ID NO:70) encoded by SEQ ID NO:69 has 
389 amino acid residues and is presented in Table 30B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV30 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
cleavage site for a NOV30 peptide is between amino acids 21 and 22. 



Table 30B. Encoded NOV30 protein sequence (SEQ ID NO:70). 



MASQTLLLLLLGALALTETWAGTHSIRYFSTAVSRPGRGEPRGTHSIRYFSTAVSRPGRG 
EPRYIAVGYVDDTQFVRFDSDAATPRMEPQAPWLEQEGPEYWDRSTPNIRPAHRLTRVNL 
PMPRRYYHQSGSNTLQimGCDLGLEGRLLRGYEQHANDGKDYIALNEDLSSWTAAAMAA 
QITQRKWEAAHEAEQQRAYLEGTCVEWLRRYLENGKETLQRTTPPPKTHMIHHSVSDYKA 
TLRCWALGFYPVEITLTWQQDGEDQTQDMELVETRPAGDGNFQKWAAVWPSGEEQRYMC 
HVQHEGLPKPLTLRWEQSSQPTIPIVGIVAGLVLLGAWTGAWSAVMCRKNSFCSTPGS 
NHAQGSDVSLTACKGETLGDLMCGGCWGQ 



A search of sequence databases reveals that the NOV30 amino acid sequence has 
258/338 (76%) identity and 284/338 (84%) similarity with SPTREMBL-ACC:Q31602 MHC 
15 CLASS I ANTIGEN - Homo sapiens. Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV30 is expressed in at least Bone Marrow, Dermis, Hippocampus, Placenta, 
Tonsils. This information was derived by determining the tissue sources of the sequences that 
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were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, Literature sources, and/or RACE sources. 

The disclosed NOV30 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 30C. 

5 



Table 30C. BLAST results for NOV30 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi | 8571968 |gb|AAF76 
942 . 1 (AF179640 1 
(AF179640) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


283/383 
(73%) 


313/383 
(80%) 


e-155 


gi | 8571970 | gb | AAF7 6 
943 . 1 |AF179641 1 
(AF179641) 


MHC class I 
antigen [Pan 
troglodytes] 


363 


282/383 
(73%) 


312/383 
(80%) 


e-155 


gi | 811780 8 | gb | AAF72 
780 . 1 | AF168399 1 
(AF168399) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


282/383 
(73%) 


312/383 
(80%) 


e-155 


gi|211877l|pir| | 154 
493 


MHC class I 
histocompat ibilit 
y antigen HLA-A 
alpha chain 
precursor- human 


365 


280/383 
(73%) 


312/383 
(80%) 


e-154 


gi| 604904 9 | gb | AAF02 
442. l| (AF115463) 


MHC class I 
antigen [Pan 
troglodytes] 


365 


281/383 
(73%) 


311/383 
(80%) 


e-153 



Tables 30D-E list the domain descriptions from DOMAIN analysis results against 
NOV30. This indicates that the NOV30 sequence has properties similar to those of other 
proteins known to contain this domain. 

10 



Table 30D. Domain Analysis of NOV30 

gnl 1 Pf am [ pf am0012 9 / MHC_I , Class I Histocompatibility antigen, domains 
alpha 1 and 2 . 

CD-Length = 179 residues, 100.0% aligned 

Score = 245 bits (626) , Expect = 3e-66 



Table 30E. Domain Analysis of NOV30 

gnl 1 Smart 1 smart00407 , IGcl, Immunoglobulin C-Type 

CD-Length = 75 residues, 100.0% aligned 
Score = 75.5 bits (184), Expect = 5e-15 



The disclosed NOV30 nucleic acid of the invention encoding a MHC Class I antigen- 
like protein includes the nucleic acid whose sequence is provided in Table 3 OA or a fragment 
1 5 thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
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be changed from the corresponding base shown in Table 3 OA while still encoding a protein 
that maintains its MHC Class I antigen-like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 
5 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 

10 stability of the modified nucleic acid, such that they may be used, for example, as antisense 

binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 3 percent of the bases may be so changed. 

The disclosed NOV30 protein of the invention includes the MHC Class I antigen-like 
protein whose sequence is provided in Table 30B. The invention also includes a mutant or 

1 5 variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 3 OB while still encoding a protein that maintains its MHC Class I antigen-like 
activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 24 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

20 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MHC Class I 
antigen-like protein (NOV30) may function as a member of a "MHC Class I antigen family". 
Therefore, the NOV30 nucleic acids and proteins identified here may be useful in potential 
therapeutic applications implicated in (but not limited to) various pathologies and disorders as 

25 indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 
diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 
therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of 
all tissues and cell types composing (but not limited to) those defined here. 

30 The NOV30 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the MHC Class I antigen- 
like protein (NOV30) may be useful in gene therapy, and the MHC Class I antigen-like protein 
(NOV30) may be useful when administered to a subject in need thereof. By way of 
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nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from Von Hippel-Lindau (VHL) syndrome , Alzheimer's 
disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, Multiple sclerosis,Ataxia- 
telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, Anxiety, Pain, 
Neuroprotection, Tonsilitis, Hemophilia, hypercoagulation,Idiopathic thrombocytopenic 
purpura, autoimmume disease,allergies, immunodeficiencies,transplantation, Graft vesus host, 
or other pathologies or conditions. The NOV30 nucleic acid encoding the MHC Class I 
antigen-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV30 nucleic acids and polypeptides are farther useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV30 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV30 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV31 

A disclosed NOV3 1 nucleic acid of 1 1 59 nucleotides (also referred to as CG57668- 
01) encoding a MHC Class I antigen-like protein is shown in Table 3 1 A. The start and stop 
codons are in bold letters. 



Table 31A. NOV31 nucleotide sequence (SEQ ID NO:71). 

TCTCCCCAGACGCCGAGGATGGTGCTCATGGCGCCCCGAACCCTCCTCCTGCTGCTCTCAGGGGCCCTGA 
CCCAGACCTGGGCGCGTTCCCACTCCATGAGGTATTTCTACACCACCATGTCCCGGCCCGGCCGCGGGGA 
GCCCCGCTTCATCTCCGTCGGCTACGTGGACTATACGCAGTTCGTGCGGTTCGACAGCGACGACGCGAGT 
CCGAGAGAGGAGCCGCGGGCGCCGTGGATGGAGCGGGAGGGGCCGGAGTATTGGGACCGGAACACACAGA 
TCTGCAAGGCCCAAGCACGGACTGAACGAGAGAACCTGCGGATCGCGCTCCGCTACTACAACCAGAGCGA 
GGGCGGTGGTTCCCACACCATGCAGGTGATGTATGGCTGCGACGTGGGGCCCGACGGGCGCTTCCTCCGC 
GGGTATGAACAGCACGCCTACGACGGCAAGGATTACATCGCTCTGAACGAGGACCTGCGCTCCTGGACCG 
CGGCGGACATGGCAGCTCAGATCACCAAGCGCAAGTGGGAGGCGGCCCGTGTGGCGGAGCAGCTGAGAGC 
CTACCTGGAGGGCGAGTTCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGC 
GCGTCAGACCCCCCCAAGACACATATGACCCACTACCCCATCTCTGACCATGAGGCCACCCTGAGGTGCT 
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GGGCCCTGGGCTTCTACCCTGCGGAGATCACACTGACCTGGCAGCGGGATGGGGAGGACCAGACCACGGA 
GCTCGTGGAGACCAGGCCTGCAGGGGATGGAACCTTCCAGAAGTGGGCGGCTGTGGTGGTGCCTTCTGGA 
GAGGAGCAGAGATACACCTGCCATGTGCAGCATGAGGGTCTGCCCGAGCCCCTCACCCTGAGATGGCAGG 
GTCAGGGTCCCTCACCTTCCCCCCTTTTCCCAGAGCCATCTTCCCAGCCCACCATCCCCATCGTGGGCAT 
CATTGCTGGCCTGGTTCTACTTGTAGCTGTGGTCACTGGAGCTGTGGTCACTGCTGTAATGTGGAGGAAG 
AAGAGCTCAGGTAAGGAAGGGGATGGGTATTCTACTCCAGGCGGCAACAGTGCCCAGGGCTCTGATGTGT 
CTCTCACGGCGTGAAAGGTGAGACCTTGGGGGGCCTGAT 



In a search of public sequence databases, the N0V31 nucleic acid sequence, located on 
chromsome 6 has 404 of 512 bases (78%) identical to a gb:GENBANK- 
ID:AF055066|acc:AF055066.1 mRNA from Homo sapiens (Homo sapiens MHC class 1 
region). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV31 polypeptide (SEQ ID NO:72) encoded by SEQ ID NO:71 has 
371 amino acid residues and is presented in Table 3 IB using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV3 1 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.7300. The most likely 
cleavage site for a NOV3 1 peptide is between amino acids 1 8 and 1 9. 



Table 31B. Encoded NOV31 protein sequence (SEQ ID NO:72 ). 



MVLMAPRTLLLLLSGALTQTWARSHSMRYFYTTMSRPGRGEPRFISVGYVDYTQFVRFDS 
DDASPREEPRAPWMEREGPEYWDRNTQICKAQARTERENLRIALRYYNQSEGGGSHTMQV 
MYGCDVGPDGRFLRGYEQHAYDGKDYIALNEDLRSWTAADMAAQITKRKWEAARVAEQLR 
AYLEGEFVEWLRRYLENGKETLQRASDPPKTHMTHYPISDHEATLRCWALGFYPAEITLT 
WQRDGEDQTTELVETRPAGDGTFQKWAAWVPSGEEQRYTCHVQHEGLPEPLTLRWQGQG 
PSPSPLFPEPSSQPTIPIVGIIAGLVLLVAWTGAWTAVMWRKKSSGKEGDGYSTPGGN 
SAQGSDVSLTA 



A search of sequence databases reveals that the NOV31 amino acid sequence has 
335/368 (91%) identity and 342/368 (92%) similarity with REMTREMBL-ACC:CAB66931 
Gogo-H protein - Gorilla gorilla (gorilla). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV3 1 is expressed in at least Bone Marrow, Dermis, Hippocampus, Placenta, 
Tonsils. This information was derived by determining the tissue sources of the sequences that 
were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, Literature sources, and/or RACE sources. 

The disclosed NOV3 1 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 31C. 
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Table 31C. BLAST results for NOV31 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|232260|sp|P0l893 


HLA class I 
histocompatibilit 
y antigen, alpha 
chain H precursor 
(HLA-AR) {HLA- 
12 .4) 


362 


330/376 
(87%) 


339/376 
(89%) 


e-167 


| HLAH HUMAN 


gi | 70075 |pir | | HLHU1 
2 


MHC class I 
histocompatibilit 
y antigen HLA 
alpha chain 
precursor {clone 
pHLA 12.4) - 
human 


359 


329/373 
(88%) 


337/373 
(90%) , 


e-166 


gi | 2118771 |pir| | 154 


MHC class I 
histocompat ibilit 
y antigen HLA-A 
alpha chain 
precursor 
- human 


365 


303/376 
(80%) 


326/376 
(86%) , 


e-164 


493 


gi | 41646 02 |gb|AAD0 5 


MHC class I 
antigen heavy 
chain [Homo 
sapiens] 


365 


302/376 
(80%) 


325/376 
(86%) , 


e-163 


568. l| (AF116214) 


gi | 915219 |gb|AAA735 


MHC class I 
antigen HLA-A2407 
[Homo sapiens] 


365 


302/376 
(80%) 


326/376 
(86%) , 


e-163 


18. 1| (U25971) 



Tables 31D-E list the domain descriptions from DOMAIN analysis results against 
NOV3 1 . This indicates that the NOV3 1 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 3 ID. Domain Analysis of NOV31 

gnl | Pf am | pf am0012 9 , MHC_I, Class I Histocompatibility antigen, domains 
alpha 1 and 2 

CD-Length = 179 residues, 99.4% aligned 

Score = 284 bits (727) , Expect = 5e-78 



Table 31E. Domain Analysis of NOV31 

gnl 1 Smart | smart 004 07 , IGcl, Immunoglobulin C-Type 

CD-Length = 75 residues, 98.7% aligned 

Score = 77.8 bits (190), Expect = le-15 



The disclosed NOV31 nucleic acid of the invention encoding a MHC Class I antigen- 
like protein includes the nucleic acid whose sequence is provided in Table 31 A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 

186 



tru- n ct. s^a- o 4m- ^ ^ & « ssu 

.*3L "^P ^ .i^i^Sj^L. & ILum 3U« I!za» ?L J ^ 

be changed from the corresponding base shown in Table 3 1 A while still encoding a protein 
that maintains its MHC Class I antigen-like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 
are complementary to those just described, including nucleic acid fragments that are 
5 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 

10 stability of the modified nucleic acid, such that they may be used, for example, as antisense 

binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 22 percent of the bases may be so changed. 

The disclosed NOV3 1 protein of the invention includes the MHC Class I antigen-like 
protein whose sequence is provided in Table 3 IB. The invention also includes a mutant or 

1 5 variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 3 IB while still encoding a protein that maintains its MHC Class I antigen-like 
activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 9 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

20 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this MHC Class I 
antigen-like protein (NOV31) may function as a member of a "MHC Class I antigen family". 
Therefore, the NOV3 1 nucleic acids and proteins identified here may be useful in potential 
therapeutic applications implicated in (but not limited to) various pathologies and disorders as 

25 indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 
diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 
therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of 
all tissues and cell types composing (but not limited to) those defined here. 

30 The NOV31 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the MHC Class I antigen- 
like protein (NOV31) may be useful in gene therapy, and the MHC Class I antigen-like protein 
(NOV3 1) may be useful when administered to a subject in need thereof. By way of 
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nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from Von Hippel-Lindau (VHL) syndrome , Alzheimer's 
disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, Multiple sclerosis,Ataxia- 
5 telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, Anxiety, Pain, 

Neuroprotection, Tonsilitis, Hemophilia, hypercoagulationjdiopathic thrombocytopenic 
purpura, autoimmume disease,allergies, immunodeficiencies,transplantation, Graft vesus host 
as well, or other pathologies or conditions. The NOV nucleic acid encoding the MHC Class I 
antigen-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
10 applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV3 1 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV3 1 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

1 5 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV31 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

20 disorders. 

NOV32 

A disclosed NOV32 nucleic acid of nucleotides (also referred to as CG57660-01) 
encoding a retinoic acid receptor responder-like protein is shown in Table 32A. The start and 
stop codons are in bold letters. 

25 



Table 32A, NOV32 nucleotide sequence (SEQ ID NO:73). 



AGGGGAAGCATGAGACGGCTGCGGATCTCGCTGGCCCCGTGGGTGGGCGCGGGGGACGCGGGAGGGGCCG 
AGCTCACGGGGCCAGCGCCGGGGCCTGCAGGTGGCCCTGGAGGAATCTGCAAGCACCCGCCCGfGCAGCG 
GGCCTTCCGGGAGACCAGTGTGGACAGCGCCCTGGACACGCCCTTCCCAGCTGGAACATCTGTGAGGCTG 
GAATTTAAGCTCCGGCAGACAAGCGGCTGGAGGAAGGCCTGGAAGAAACCGAAGTGCAAAGCCCAGCCCG 
AGAGGAGGAAACAGAAATG CCTGAC CTG CG TCAAAATGGACTG TGAGGATAAGGT TC TGGG CAGGATGG T 
TCGCTGCCCTCCAGAGACGCAGACTCGGCGGGAGCCTGAGGAGCACCAGGGGGCCGGGTGCAGCCCGGCG 
GAGCGGGCGGGGAGGACCCCACGGCGGAGCGGGCGGGGAGGACCCCACGGCTGCCGCTTCCCTGCACGGT 
TCGCCTCCTCCAAGGCCCGGCCCCCAGCGGAGCCCTAGCGCTGAATCGCATGGCGCCCCCTGGAGCCCTG 

GCGGG 
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In a search of public sequence databases, the NOV32 nucleic acid sequence, located on 
chromsome 4 has 443 of 451 bases (98%) identical to a gb:GENBANK- 
ID:AF146191|acc:AF146191.1 mRNA from Homo sapiens (Homo sapiens FRG1 (FRG1) 
gene, complete cds; 5S ribosomal RNA gene, complete sequence; TUB4q and TIG2 
pseudogenes, complete sequence). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV32 polypeptide (SEQ ID NO:74) encoded by SEQ ID NO:73 has 
amino acid residues and is presented in Table 32B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV32 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.8800. 



Table 32B. Encoded NOV32 protein sequence (SEQ ID NO:74). 



MRRLRISLAPWVGAGDAGGAELTGPAPGPAGGPGGICKHPPVQRAFRETSVDSALDTPFP 
AGTSVRLEFKLRQTSGWRKAWKKPKCKAQPERRKQKCLTCVKMDCEDKVLGRMVRCPPET 
QTRREPEEHQGAGCSPAERAGRTPRRSGRGGPHGCRFPARFASSKARPPAEP 



A search of sequence databases reveals that the NOV32 amino acid sequence has 
94/168 (55%) identity and 109/168 (64%) similarity with ptnr:SWISSPROT-ACC:Q99969 
RETINOIC ACID RECEPTOR RESPONDER PROTEIN 2 PRECURSOR (TAZAROTENE - 
INDUCED GENE 2 PROTEIN) (RAR-RESPONSIVE PROTEIN TIG2)- Homo sapiens. 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV32 is expressed in at least Adipose, Adrenal gland, Breast, Colon, Esophagus, 
Eye, Heart, Kidney, Liver, Lung, Ovary, Parathyroid, Placenta, Prostate, Stomach, Testis, 
Uterus, Whole embryo, bladder tumor, brain, cervix, colon, head and neck, kidney, lung, 
muscle and ovary. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV32 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 32C. 



Table 32C. BLAST results for NOV32 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|4506427|ref |NP 0 


retinoic acid 

receptor 

responder 

(tazarotene 

induced) 2 [Homo 

sapiens] 


163 


94/169 
(55%) 


109/169 
(63% 


8e-37 


02880 .1| 
(NM_002889) 
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qi| 12832179 |dbj | BAB 


homolog to 
RETINOIC ACID 
RECEPTOR 

RESPONDER PROTEIN 
2 PRECURSOR 
< TAZAROTENE - 
INDUCED GENE 2 
PROTEIN) (RAR- 
RESPONSIVE 
PROTEIN 

TIG2) -putative 
[Mus musculus] 


162 


72/169 
(42%) 


94/169 
(55%) , 


3e-22 


21997. 1| (AK002298) 


gi | 17436162 | ref |XP 


similar to 

retinoic acid 

receptor 

responder 

(tazarotene 

induced) 2 [Homo 

sapiens] 


206 


73/74 
(98%) 


74/74 
(99%) 


le-20 


067978 . 1 | 
(XM_067978) 


gi | 18585954 | ref | XP 


hypothetical 
protein XP_091375 
[Homo sapiens] 


322 


52/79 
(65%) 


59/79 
(73%) 


2e-18 


091375 .1 | 
(XM 091375) 


gi|l748805l|ref |XP 


similar to 
Fibroblast growth 
factor receptor 3 
precursor (FGFR- 
3) (Heparin- 
binding growth 
factor receptor) 
[Homo sapiens] 


296 


28/54 
(51%) 


32/54 
(58%) , 


2e-5 


064055.1 | 
(XM_064055) 



Retinoids exert their biologic effects through two families of nuclear receptors, retinoic 
acid receptors (RARs) and retinoid X receptors (RXRs), which belong to the superfamily of 
steroid/thyroid hormone nuclear receptors. The retinoid-mediated up-regulation in the 
expression of TIG2 was confirmed by Northern blot analysis. Upon sequencing, TIG2 was 
found to be a cDNA whose complete sequence was not in the GenBank and EMBL data bases. 
The TIG2 cDNA is 830 bp long and encodes a putative protein product of 164 amino acids. 
TIG2 is neither expressed nor induced by tazarotene in primary keratinocyte and fibroblast 
cultures. Thus, TIG2 is expressed and induced by tazarotene only when keratinocytes and 
fibroblasts form a tissue-like 3 -dimensional structure. RAR- specific retinoids increase TIG2 
mRNA levels. In contrast, neither RXR-specific retinoids nor 1,25-dihydroxyvitamin D3 
increased TIG2 levels. TIG2 is expressed at high levels in nonlesional psoriatic skin but at 
lower levels in the psoriatic lesion and that its expression is up-regulated in psoriatic lesions 
after topical application of tazarotene. 

The disclosed NOV32 nucleic acid of the invention encoding a retinoic acid receptor 
responder-like protein includes the nucleic acid whose sequence is provided in Table 32A or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 32A while still encoding a 
protein that maintains its retinoic acid receptor responder-like activities and physiological 
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functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
5 include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
10 acids, and their complements, up to about 2 percent of the bases may be so changed. 

The disclosed NOV32 protein of the invention includes the retinoic acid receptor 
responder-like protein whose sequence is provided in Table 32B, The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 32B while still encoding a protein that maintains its retinoic acid 
1 5 receptor responder-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 45 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2 ; that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this retinoic acid 
20 receptor responder-like protein (NOV32) may function as a member of a "retinoic acid 

receptor responder family". Therefore, the NOV32 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
25 antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV32 nucleic acids and proteins of the invention are useful in potential 
30 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the retinoic acid receptor 
responder-like protein (NOV32) may be useful in gene therapy, and the retinoic acid receptor 
responder-like protein (NOV32) may be useful when administered to a subject in need thereof. 
By way of nonlimiting example, the compositions of the present invention will have efficacy 
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for treatment of patients suffering from Cardiomyopathy, Atherosclerosis,Hypertension, 
Congenital heart defects, Aortic stenosis ,Atrial septal defect (ASD),Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases,Tuberous sclerosis, Scleroderma, Obesity,Transplantation, 
5 Diabetes,Von Hippel-Lindau (VHL) syndrome , Pancreatitis,Obesity, Endometriosis,Fertility, 
Hemophilia, Hypercoagulationjdiopathic thrombocytopenic purpura , 

Immunodeficiencies,Graft vesus host, Autoimmune disease, Renal artery stenosis, Interstitial 
nephritis, Glomerulonephritis, Polycystic kidney disease, Systemic lupus erythematosus, 
Renal tubular acidosis, IgA nephropathy, Hypercalceimia, Lesch-Nyhan syndrome, or other 
10 pathologies or conditions. The NOV32 nucleic acid encoding the retinoic acid receptor 
responder-like protein of the invention, or fragments thereof, may further be useful in 
diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are 
to be assessed. 

NOV32 nucleic acids and polypeptides are further useful in the generation of 
1 5 antibodies that bind immuno-specifically to the novel NO V32 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV32 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
20 assay systems for functional analysis of various human disorders, which will help in 

understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV33 

25 A disclosed NOV33 nucleic acid of 1706 nucleotides (also referred to as ) encoding a 

PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE -like protein is shown in Table 
33A. The start and stop codons are in bold letters. 



Table 33A. NOV33 nucleotide sequence (SEQ ID NO:75). 

CTGCCAAGATGGCGTCGGCCTCCTCCCAACCGTCGTTGGCGGTCGGTTTTTCATCCTTTGATCCCGGGGC 
CCCTTCCTGTACCGCGTCCTCAGCATCTGGAATCTTGAGCCCCACGGCATCTGAGGTGCCTTATGCCTCT 
GGCATGCCCATCAAGAAAACAGGCCATCGAGGTGTCGATTCCTCAGGAGAGACAACATATAAAAAGACAA 
C C T CAACAGCCTTGAAAGGTG C CAT C CAGTTAGG CATTAC T TACAC TG TGGGGAG C C TGAGTACCAAAC C 
AGAGCGTGATGTCCTCATGCAAGATTTCTACGTGGTGGAGAGTATCTTCTTCCCCAGTGAAGGGAGCAAC 
CTGACCCCTGCTCATCACTACAATGCCTTCCGTTTCAAGACCTATGCGCCGGTTGCCTTCCGCTACTTTC 
GGGAGCTATTTGGTATCCCGCCCGATGATTACTTGTGCTCCCTCTGCAGTGAGCCGCTGATTGAACTCTG 
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TAGCTCTGGAGCTAGTGGTTCCCTGTTCTATGTGTCCAGCGACGATGAACTCATTATTAAGACACTCCAA 
CATAAAG AGG CGG AG TTT CTG CAG AAG C TG C TT C C AGG AT AC T ACT TGAAC C T CAG C C AGAAC C CT CGG A 
CTTTGCTGCCTAAATTCTTTGGACTGTACTGTGTGCAGACAGGTGGCAAGAACATTCGGATTGTGGTGAT 
G AAC AAT C T T TT AC C AAG AT C CG T C AAAATG CAT AT CAAAT ATG AC C T C AAAGG C T CAAC CT ACAAACG C 
CGGG CTT C C CAGAAAGAG CGAG AGAAG C CT C T T C C CACAT T TAAAG AT C T AG ACT T CT TACAAGACAT C C 
CTGATGGTCTTTTTTTGGATGCTGACACGTACAATGCTCTCTGTAAGACCCTGCAGCGTGACTGTTTGGT 
G C TG CAG AG C TT CAAGAT AATGGACT ATAG C CT C TGG C TGT CAAT C C ACAAT AT AGAT CATG CACAACG A 
GAG C C CT T AAG CAG CGAC ACT CTT CAAG TGT CAAT CGACAC T C AAAG AC TGG C T C C C CAAAAGGCT CTG T 
ATT C CAC AG C CATGG AAT T CATC CAGGGAGAGG C T CGG C TGGGCGACAC C ATGG AGG C CGATG AC CAT AT 
GGGTGGCATCCCTGCTCAGAATAGTAAAGGGGAAAGGCTTCTGCTTTATATTGGCATCATTGACATTCTA 
CAGTCTTACACGTTTCTTAAGAAGTTGGAGCACTCTTGGAAAGCCGTGGTACATGATGGGGACGCTGTCT 
CAGTG CAT CG C C C AGG CT T CT ACG C TG AACGG T T C CAG C AC T T CATG TG C AACG CAG TATT T AAGAAGAT 
CCCCTTGAAGCCTTCTCCTTCCAAAAAGTTTCGGTCTGGCTTATCTTTCTCTCTGCATACGGGCTCCAGT 
GGCAACTCCTGCATTACTTACCAGCCATTGGTCTCTGAGGAACACAAGTCACAAGTGATAAAGGTGCAAG 
TGGAGCCAGGTGTTCACCTTGGTCGTTCTGATGTTTTACCTCAGACCTCAGAATCCACCTTTGGAGGAAA 
TCAGGGAGGGCTCACTATTACTGACCACAGTTTCTCACCTGTAGTTGGAAAGACTTTGCATATGCTAACT 
ACAAGTATAAC CT TGGAAAAACT TGAATG T ACAGAG T CAG AG TT CAC C CAT TAAG CG CAAAG C CT CAGAA 
GACCTGGAACAAGATTCTGCTTCTCT 



In a search of public sequence databases, the NOV33 nucleic acid sequence, located on 
chromsome 6 has 1586 of 1706 bases (92%) identical to a gb:GENBANK- 
ID:HSU78575|acc:U78575.1 mRNA from Homo sapiens (Human 68 kDa type I 
5 phosphatidylinositol-4-phosphate 5-kinase alpha mRNA, clone PIPSKIal, complete cds). 

Public nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV33 polypeptide (SEQ ID NO:76) encoded by SEQ ID NO:75 has 
551 amino acid residues and is presented in Table 33B using the one-letter amino acid code. 



Table 33B. Encoded NOV33 protein sequence (SEQ ID NO:76). 

MASASSQPSLAVGFSSFDPGAPSCTASSASGILSPTASEVPYASGMPIKKTGHRGVDSSG _ 

ETTYKKTTSTALKGAIQLGITYTVGSLSTKPERDVLMQDFYWESIFFPSEGSNLTPAHH 

YNAFRFKTYAPVAFRYFRELFGIPPDDYLCSLCSEPLIELCSSGASGSLFYVSSDDELII 

KTLQHKEAEFLQKLLPGYYLNLSQNPRTLLPKFFGLYCVQTGGKJSTIRIVVMNNLLPRSVK 

MHIKYDLKGSTYKRRASQKEREKPLPTFKDLDFLQDIPDGLFLDADTYNALCKTLQRDCL 

VLQSFKIMDYSLWLS IHNIDHAQREPLSSDTLQVS IDTQRLAPQKALYSTAMEFIQGEAR 

LGDTMEADDHMGGIPAQNSKGERLLLYIGIIDILQSYTFLKKLEHSWKAWHDGDAVSVH 

RPGFYAERFQHFMCNAVFKKI PLKPS PS KKFRSGLS FS LHTGS SGNS C I TYQPLVS EEHK 

SQVIKVQVEPGVHLGRSDVLPQTSESTFGGNQGGLTITDHSFSPWGKTLHMLTTSITLE 

KLECTESEFTH 

io " ~ ' " ' i 

A search of sequence databases reveals that the NOV33 amino acid sequence has 478 
of 551 amino acid residues (86%) identical to, and 496 of 551 amino acid residues (90%) 
similar to, the 549 amino acid residue ptnr:SPTREMBL-ACC:Q99754 protein from Homo 
sapiens (Human) (68 KDA TYPE I PHOSPHATIDYLINOSITOL-4-PHOSPHATE 5- 
15 KINASE ALPHA (EC 2.7.1.68) (l-PHOSPHATIDYLINOSITOL-4-PHOSPHATE KINASE) 
(DIPHOSPHOINOSITIDE KINASE) (PTDINS(4)P-5-KINASE)). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 
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NOV33 is expressed in at least Adrenal gland, Aorta, B-cells, Blood, Bone, Brain, 
Breast, CNS, Colon, Ear, Esophagus, Eye, Gall bladder, Germ Cell, Head and neck, Heart, 
Kidney, Larynx, Liver, Lung, Lymph, Marrow, Muscle, Neural, Omentum, Ovary, Pancreas, 
Parathyroid, Peripheral nervous system, Placenta, Pooled, Prostate, Skin, Small intestine, 
Spleen, Stomach, Synovial membrane, Testis, Tissue culture, Tonsil, Uterus, Whole embryo, 
and adrenal gland. Expression information was derived from the tissue sources of the 
sequences that were included in the derivation of the sequence of CG57672-01 .The sequence 
is predicted to be expressed in the following tissues because of the expression pattern of 
(GENB ANK-ID : gb:GENBANK-ID:HSU78575|acc:U78575.1) a closely related Human 68 
kDa type I phosphatidylinositol-4-phosphate 5-kinase alpha mRNA, clone PIPSKIal, 
complete cds homolog in species Homo sapiens .fetal brain. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV33 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 33C. 



Table 33C. BLAST results for NOV33 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|4505815|ref |NP 0 


phosphatidylinosi 
tol - 4 -phosphate 
5-kinase, type I, 
alpha [Homo 
sapiens] 


549 


478/552 
(86%) 


496/552 
(89%) , 


0 . 0 


03548 .1| 
(NM_003557) 


gi | 1743 873 |gb| AAC50 
911. lj (U78576) 


6 8 kDa type I 
phosphatidylinosi 
tol - 4 -phosphate 
5-kinase alpha 
[Homo sapiens] 


562 


478/565 
(84%) 


496/565 
(87%) , 


0 . 0 


gi | 1743875|gb|AAC50 
912. l| (U78577) 


6 8 kDa type I 
phosphatidylinosi 
tol - 4 -phosphate 
5-kinase alpha 
[Homo sapiens] 


500 


439/551 
(79%) 


455/551 
(81%) , 


0.0 


gi | 6679331 1 ref |NP 0 


phosphat idyl inos i 
tol- 4 -phosphate 
5-kinase, type 1 
beta; PI4P5K-I[b] 
[Mus musculus] 


546 


434/552 
(78%) 


470/552 
(84%) , 


0.0 


32873 .l| 
(NM_008847) 


gi 1 14745097 | ref | XP 
018166 .2 | 
(XM_018166) 


similar to 
phosphatidylinosi 
tol -4 -phosphate 
5-kinase, type I, 
alpha (H. 
sapiens) [Homo 
sapiens] 


435 


361/408 
(88%) 


381/408 
(92%) , 


0.0 
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Tables 33D-E list the domain descriptions from DOMAIN analysis results against 
NOV33. This indicates that the NOV33 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 33D. Domain Analysis of NOV33 

gnl | Smart | smartOQ33Q , PIPKc, Phosphatidyl inositol phosphate kinases 

CD-Length = 339 residues, 100.0% aligned 

Score = 410 bits (1054), Expect = le-115 



Table 33E. Domain Analysis of NOV33 

gnl 1 Pf amj pfam01504 / PIP5K, Phosphat idylinositol -4 -phosphate 5-Kinase. 
This family contains a region from the common kinase core found in the 
type I phosphatidylinositol- 4 -phosphate 5-kinase (PIP5K) family as 
described in. The family consists of various type I, II and III PIP5K 
enzymes. PIP5K catalyses the formation of phosphoinositol-4 , 5- 
bisphosphate via the phosphorylation of phosphatidylinositol -4 - 
phosphate a precursor in the phosphinositide signaling pathway 

CD-Length = 293 residues, 99.7% aligned 

Score = 390 bits (1002) , Expect = le-109 



Phosphatidylinositol-4-phosphate 5-kinases (PIP5K) synthesize phosphatidylinositol- 
4,5-bisphosphate, a key precursor in phosphoinositide signaling that also regulates some 
proteins and cellular processes directly. Two distinct PIP5Ks have been characterized in 

10 erythrocytes, the 68-kDa type I (PIP5KI) and 53-kDa type II (PIP5KII) isoforms. Using 

peptide sequences from the erythroid 68-kDa PIP5KI, cDNAs encoding PIP5KI from human 
brain were isolated. Partial cDNAs obtained for a second isoform, PIP5KI , established that 
the human STM7 gene encoded a previously unrecognized PIP5KI. However, the peptide 
sequences demonstrated that erythroid PIP5KI corresponded to PIP5KI . Recombinant, 

15 bacterially expressed PIP5KI possessed PIP5K activity and was immunoreactive with 
erythroid PIP5KI antibodies. By Northern analysis, PIP5KI and PIP5KI had wide tissue 
distributions, but their expression levels differed greatly. PIPSKIs had homology to the kinase 
domains of PIP5KII , yeast Mss4p and Fablp, and a new Caenorhabditis elegans Fabl-like 
protein identified in the data base. These new isoforms have refined the sequence requirements 

20 for PIP5K activity and, potentially, regulation of these enzymes. Furthermore, the limited 
homology between PIPSKIs and PIP5KII , which was almost exclusively within the kinase 
domain core, provided a molecular basis for distinction between type I and II PIP5Ks. 
Phosphatidylinositol-4-phosphate 5-kinases (PIP5Ks) synthesize phosphatidylinositol 4,5- 
bisphosphate by phosphorylating phosphatidylinositol 4-phosphate. By searching sequence 
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databases with peptide sequences obtained from the 68-kD type I PIP5K purified from bovine 
erythrocytes, Loijens and Anderson (1996) identified a human EST encoding PIP5K1A, which 
they called PIP5KI-alpha. They screened a human fetal brain cDNA library and isolated full- 
length PIP5K1 A cDNAs. The deduced 549-amino acid protein has the conserved kinase 
5 homology domain of PIP5K family members. Within this domain, PIP5K1 A shows 83% and 
35% amino acid identity with PIP5K1B and PIP5K2A, respectively. Overall, the PIP5K1 A 
and PIP5K1B proteins are 64% identical. Recombinant PIP5K1A expressed in bacteria had a 
molecular mass of approximately 66.3 kD by Western blot analysis. The authors isolated 
additional PIP5K1 A cDNAs which they suggested represent splicing isoforms. Northern blot 
10 analysis detected a major 4.2-kb PIP5K1 A transcript which had a wide tissue distribution. 

The disclosed NOV33 nucleic acid of the invention encoding a 
PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE-like protein includes the nucleic 
acid whose sequence is provided in Table 33A or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 

1 5 corresponding base shown in Table 33 A while still encoding a protein that maintains its 

PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 

20 includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 

include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 

25 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 8 percent of the bases may be so changed. 

The disclosed NOV33 protein of the invention includes the 
PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE-like protein whose sequence is 
provided in Table 33B. The invention also includes a mutant or variant protein any of whose 

30 residues may be changed from the corresponding residue shown in Table 33B while still 
encoding a protein that maintains its PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5- 
KINASE-like activities and physiological functions, or a functional fragment thereof. In the 
mutant or variant protein, up to about 14 percent of the residues may be so changed. 
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The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2 ; that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this 
PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE-like protein (NOV33) may 
5 function as a member of a "PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE 
family". Therefore, the NOV33 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 

1 0 (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV33 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

1 5 and disorders as indicated below. For example, a cDNA encoding the 

PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE-like protein (NOV33) may be 
useful in gene therapy, and the PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE- 
like protein (NOV33) may be useful when administered to a subject in need thereof. By way 
of nonlimiting example, the compositions of the present invention will have efficacy for 

20 treatment of patients suffering from Cardiomyopathy, Atherosclerosis,Hypertension, 

Congenital heart defects, Aortic stenosis ,Atrial septal defect (ASD),Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases,Tuberous sclerosis, Scleroderma, Obesity,Transplantation, 
Adrenoleukodystrophy , Congenital Adrenal Hyperplasia, Hemophilia, 

25 Hypercoagulationjdiopathic thrombocytopenic purpura , Immunodeficiencies,Graft vesus 

host, Von Hippel-Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy,Lesch- 
Nyhan syndrome, Multiple sclerosis,Ataxia-telangiectasia,Leukodystrophies,Behavioral 
disorders, Addiction, Anxiety, Pain, Neuroprotection, or other pathologies or conditions. The 

30 NOV33 nucleic acid encoding the PHOSPHATIDYLINOSITOL 4-PHOSPHATE 5-KINASE- 
like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 
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NOV33 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV33 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV33 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV34 

A disclosed NOV34 nucleic acid of 1 3 1 6 nucleotides (also referred to as CG57680-01) 
encoding a Cyclophilin-type peptidyl-prolyl cis-trans isomerase-like protein is shown in Table 
34A. The start and stop codons are in bold letters. 



Table 34A. NOV34 nucleotide sequence (SEQ ID NO:77). 



TGGGTGCCATGGCGGTTCTACTGGAGACTACTGTGGGCAATGTGGTTGTCAATTTGCACACTGAGCAGCA 
GCCTTGCAACTGTGAACTTTTTGAGAGCAGGTACCACAGTTTAATGGCATTTAATTTCTTGAGATATTAC 
AAAATAAAATATTACAGT TATTG C CT TATT CACAGTATACAAAGG T ATTTTAT CATACAAAC TGTTGAT C 
C TACAGGGACTGG T CATGGAGGAGAG TCTATT TTTGG C CTAGGATTG TATGGTGAT CAAG CAAG CT T TT T 
TGAG ACAGAAAACGT C CCAAG AATTAAG CACAAG AAGAAGGG CACAATG TC CATGG TGAATAATGACAG T 
G ATCAACATGG AT CT CAGTT T CTT AT C ACT ACAGG AG AAAAT CT AG ATT AC C T TG ATGGT AC C C ATACAG 
TATTTGGTGAGGTGACAGAAGGCATTGACATAATTAAGAAAATAAATGAGACCTTTGTTGACAAGGACTT 
TGTAC CATAT CAGG ATAT CAGG ATAAATTATATAGTGATT TTAGATGG T CCATTTGATGACAT T C C TG AT 
TTATTAAT C CCTGAT CAAT CAC CAGAAC CTACAAGGGAACAATT AAAGAGTGGTAG AGTTGACACAAATG 
AAGAAATTG ATCATTTCAAACGAAGGTCAG C CGAAG AAGTAG AAGAAATAAAGGCAGAAAAAG AAGCTAA 
AACTCAGGCTTTACTTTTAGAGATGGTGGGAGACCTACCTGATGCAGATATTAAACCTCCGGAAAAATCT 
GTGTTTGTATGCAAATTGAATCCAGTGACCACAGATGAGGATCTGGATATAATACTCTCTAGATTTGGGC 
CAATAAGAAGTTGTGAAGTTATCTGGGACTGGAAGACAGGAGAAATCCTCTGTTATTTCTTTCTTTCTTT 
C TATG C TT TTATTGAAT TTGAAAAGG AAGAAGATTATGAGAAAGC CTT C TTCAAAATGGACAATATACT T 
ATAGATGACAG AAGAAAACATGGATTTGC CAGT C TGT TACAAAGGTTAAATGGAAGGAAAAAG TGGGAAA 
TACACCAACAGCCGGGGCGCAGCCCACGCCGCCGCCGCCGCCCGCACCGTTCCCGCTCCCGGCGGCGGCG 
ACGG CGGGCGGGGACC CCGCGGCGCTG CGC CTCAT CC T CTGCGACGGGGGATGAGGGGT CTGAGAGGAAC 
TGGAGGAGGAGGAGGAGGAGG C AGTGG ACATGG TGG C T C TG CGGG C CC TGGACGC CGCCGGC CGACACGG 
AGGCGCGGGGCGGGAGGCGCGCGAGCGTGTGGGAGCCGGCGCGATAGTGCGTGCGA 



In a search of public sequence databases, the NOV34 nucleic acid sequence, located on 
chromsome 7 has 149 of 235 bases (63%) identical to a gb:GENBANK- 
ID:A45258|acc:A45258.1 mRNA from human herpesvirus 2 (Sequence 2 from Patent 
W095 16779). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV34 polypeptide (SEQ ID NO:78) encoded by SEQ ID NO:77 has 
432 amino acid residues and is presented in Table 34B using the one-letter amino acid code. 
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Signal P, Psort and/or Hydropathy results predict that NOV34 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.9820. 



Table 34B. Encoded NOV34 protein sequence (SEQ ID NO:78). 



MAVLLETTVGNVWNLHTEQQPCNCELFESRYHSLMAFNFLRYYKIKYYSYCLIHSIQRY 
FIIQTVDPTGTGHGGESIFGLGLYGDQASFFETENVPRIKHKKKGTMSMVNNDSDQHGSQ 
FL I TTGENLDYLDGTHTVFGEVTEG I D I I KK INETFVDKDFVP YQD I R INYI VI LDGPFD 
DIPDLLIPDQSPEPTREQLKSGRVDTNEEIDHFKRRSAEEVEEIKAEKEAKTQALLLEMV 
GDLPDADIKPPEKSVFVCKLNPVTTDEDLDIILSRFGPIRSCEVIWDWKTGEILCYFFLS 
FYAFI E FEKEEDYEKAFFKMDN I LI DDRRKHGFASLLQRLNGRKKVJE IHQQPGRSPRRRR 
RPHRSRSRRRRRRAGTPRRCASSSATGDEGSERNWRRRRRRQWTWWLCGPWTPPADTEAR 
GGRRAS VWE PAR 



A search of sequence databases reveals that the NOV34 amino acid sequence has 
5 263/345 (76%) identity and 291/345 (84%) similarity with TREMBLNE W-ACC :B AB307 1 1 6 
DAYS NEONATE HEAD CDNA, RIKEN FULL-LENGTH ENRICHED LIBRARY, 
CLONE:5430431E21, FULL INSERT SEQUENCE -Mus musculus. Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV34 is expressed in at least Adrenal Gland/Suprarenal gland, Bone, Bone Marrow, 
10 Brain, Colon, Hair Follicles, Heart, Hippocampus, Kidney, Liver, Lung, Lymphoid tissue, 

Pancreas, Peripheral Blood, Prostate, Salivary Glands, Small Intestine, Testis, Tonsils, Uterus. 
This information was derived by determining the tissue sources of the sequences that were 
included in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 
1 5 The disclosed NOV34 polypeptide has homology to the amino acid sequences shown 

in the BLASTP data listed in Table 34C. 



Table 34C. BLAST results for NOV34 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi| 12 84 906 9 | dbj | BAB 


homolog to 
DJ12G14.1 (NOVEL 
CYCLOPHILIN TYPE 
PEPTIDYL- PROLYL 
CIS-TRANS 
ISOMERASE) 
(FRAGMENT) -putati 
ve [Mus musculus] 


382 


261/331 
(78%) 


284/331 
(84%) 


e-138 


28194 . 1 | (AK012371) 


gi| 1284 7571 | dbj | BAB 


homolog to 
DJ12G14 . 1 (NOVEL 
CYCLOPHILIN TYPE 
PE PT I D YL - PROLYL 
CIS -TRANS 
ISOMERASE) 
(FRAGMENT) -putati 
ve [Mus musculus] 


418 


261/331 
(78%) 


284/331 j 
(84%) 


e-137 


27623 ,1| (AK011443) 
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gi| 12856573 | dbj | BAB 


homolog to 
DJ12G14 . 1 (NOVEL 
CYCLOPHILIN TYPE 
PEPTIDYL- PROLYL 
CIS -TRANS 
ISOMERASE) 
(FRAGMENT) -putati 
ve [Mus musculus] 


460 


261/331 
(78%) 


284/331 
(84% 


e-136 


30711. l| (AK017370) 


gi| 12852237) dbj | BAB 


homolog to 
DJ12G14 . 1 (NOVEL 
CYCLOPHILIN TYPE 
PEPTIDYL- PROLYL 
CIS -TRANS 
ISOMERASE) 
(FRAGMENT) -put at i 
ve [Mus musculus] 


492 


259/331 
(78%) 


284/331 
(85%) 


e-135 


29330. l| (AK014406 


gi | 18088111 | gb | AAH2 


protein for 
MGC:9727) [Homo 
sapiens] 


492 


263/331 
(79%) 


284/331 
(85%) 


e-135 


0986.1 |AAH20986 
(BC020986) 



Tables 34D-E list the domain descriptions from DOMAIN analysis results against 
NOV34. This indicates that the NOV34 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 34D- Domain Analysis of NOV34 

gnl 1 Pf am | pf am 00160 , pro_isomerase, Cyclophilin type pept idyl -prolyl 
cis-trans isomerase 

CD-Length = 162 residues, 88.9% aligned 

Score = 91.7 bits (226), Expect = 8e-20 



Table 34E. Domain Analysis of NOV34 

gnl [ Pfamjpf am 0 0 076 , rrm, RNA recognition motif, (a.k.a. RRM, RBD, or 
RNP domain) . The RRM motif is probably diagnostic of an RNA binding 
protein. RRMs are found in a variety of RNA binding proteins, 
including various hnRNP proteins, proteins implicated in regulation of 
alternative splicing, and protein components of snRNPs . The motif also 
appears in a few single stranded DNA binding proteins. The RRM 
structure consists of four strands and two helices arranged in an 
alpha/beta sandwich, with a third helix present during RNA binding in 
some cases The C-terminal beta strand (4th strand) and final helix are 
hard to align and have been omitted in the SEED alignment The LA 
proteins have a N terminus rrra which is included in the seed. There is 
a second region towards the C terminus that has some features of a rrm 
but does not appear to have the important structural core of a rrm. 
The LA proteins are one of the main autoantigens in Systemic lupus 
erythematosus (SLE) , an autoimmune disease. 

CD-Length = 71 residues, 95.8% aligned 
Score = 47.8 bits (112), Expect = le-06 



The cyclophilins are a conserved class of proteins that bind the immunosuppressive 

drug cyclosporin A (CsA) with high affinity. CsA blocks helper T-cell activation at a step 

between T-cell receptor stimulation and the transcriptional activation of cytokine genes. 
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Cyclophilins from many species possess peptidyl-prolyl cis-trans isomerase (PPIase) activity 
that is blocked by CsA and therefore may be relevant in CsA-mediated immunosuppression. 
Probing with the previously known cyclophilin cDNA under reduced stringencies, Price et al. 
(1991) identified a second cyclophilin gene, which encoded cyclophilin B (CYPB). The 
5 deduced protein was 64% identical to C YPA and was distinguished from it by a signal 
sequence that probably directs it to the endoplasmic reticulum (ER). CYPB showed even 
stronger similarity to yeast CYPB, which also has an ER-directed signal sequence. The signal 
sequence is removed from the protein upon expression in E. coli, and the processed protein 
possesses PPIase activity that is inhibited by CsA. Peddada et al. (1992) used the PCR 

10 technique to generate a unique probe complementary to the hydrophobic 5-prime end of the 
human cyclophilin B gene. Using this probe in an analysis of human/hamster hybrid somatic 
cell lines, they assigned the gene to chromosome 15. The human PIN1 gene encodes an 
essential nuclear peptidyl-prolyl cis/trans isomerase involved in the regulation of mitosis. 
PIN1 is a member of a new class of peptidyl-prolyl cis/trans isomerases that includes the 

15 Escherichia coli parvulin, yeast ESS1, and Drosophila melanogaster dodo gene products. Lu et 
al. (1996) described human PIN1 and showed that deletion of PIN 1 from HeLa cells induces 
mitotic arrest, while HeLa cells overexpressing PIN1 arrest in the G2 phase. Campbell et al. 
(1997) identified a gene closely related to the gene encoding the essential nuclear peptidyl- 
prolyl cis/trans isomerase (PIN1) involved in the regulation of mitosis. The novel gene, called 

20 PIN1L by them, is 89% identical at the nucleotide level to the PIN1 transcript, but contains a 
shift in the reading frame. It encodes a 100-amino acid variant protein consisting of 63 amino 
acids homologous (90% identical) to PIN1 and contains the entire WW domain, fused to a 37- 
amino acid tail. By fluorescence in situ hybridization and somatic cell hybrid analysis, 
Campbell et al. (1997) mapped PIN1L to lq3 1 . They commented that the protein encoded by 

25 PIN1L may have some functional role or, alternatively, PIN1L may be a transcribed 

pseudogene. Campbell et al. (1997) found by analysis of human expressed sequence tags 
(ESTs) 2 different but closely related human transcripts, 1 of which corresponds to PIN1 . 
Gene localization, using both fluorescence in situ hybridization and tritium-labeled probes, 
showed that each of the human transcripts hybridized to lp31 and 19pl3. Primers were 

30 designed to discriminate between the 2 transcripts, and PCR on DNA from hamster/human 

somatic cell hybrids retaining chromosomes 1 or 19 was used to map the human PIN1 gene to 
chromosome 19, and PIN1L, the closely related gene, to chromosome 1 . Their results 
established that PIN1 is at 19pl3 and PIN1L at lp31. PCR was used to clone the coding 
region for PIN1L. The PIN1L cDNA is 89% identical at the nucleotide level to the PIN1 
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transcript, but contained a shift in the reading frame. Campbell et al. (1997) commented that 
the protein encoded by PIN1L may have some functional role or, alternatively, PIN1L may be 
a transcribed pseudogene. With the knowledge that PIN1 specifically isomerizes 
phosphorylation of a serine or threonine that precedes proline and regulates the function of 
5 mitotic phosphoproteins, and hypothesizing that restoring the function of phosphorylated tau 
might prevent or reverse paired helical filament (PHF) formation in Alzheimer disease, Lu et 
al. (1999) demonstrated that the WW domain of PIN 1 binds to phosphorylated tau at 
threonine-231 (T231). The T231 residue is hyperphosphorylated in Alzheimer disease and is 
phosphorylated to a certain extent in the normal brain. Using a pulldown assay, Lu et al. 

10 (1999) demonstrated that PIN1 binds to hyperphosphorylated tau from the brains of people 
with Alzheimer disease but not to tau from age -matched healthy brains. By immunoblotting, 
Lu et al. (1999) detected endogenous PIN1 in the PHFs of diseased brains, and using 
immunohistochemistry, they found that recombinant PIN1 binds to pathologic tau. Using 
immunohistochemistry, Lu et al. (1999) localized PIN1 to the nucleus in healthy brains. In the 

1 5 brains of people with Alzheimer disease, PIN1 staining was associated with pathologic tau in 
neuronal cells. Lu et al. (1999) also demonstrated that phosphorylated tau could neither bind 
microtubules nor promote microtubule assembly. However, PIN 1 was able to restore the 
ability of phosphorylated tau to bind microtubules and promoted microtubule assembly in 
vitro. The level of soluble PIN1 in the brains of Alzheimer patients was greatly reduced 

20 compared to that in age-matched control brains. The authors concluded with the hypothesis 
that since depletion of PIN 1 induces mitotic arrest and apoptotic cell death, sequestration of 
PIN1 into PHFs may contribute to neuronal death. In the frog, Pinl is implicated in the 
regulation of cell cycle progression and required for the DNA replication checkpoint. By 
fluorescence microscopy, Winkler et al. (2000) observed that nuclear extracts from Xenopus 

25 eggs depleted of Pinl inappropriately transited from the G2 to the M phase of the cell cycle in 
the presence of a DNA replication inhibitor. Immunoblot analysis revealed that inappropriate 
transition was accompanied by hyperphosphorylation of CDC25 (see CDC25A), activation of 
CDC2 cyclin B and mitotic phosphoproteins. Addition of recombinant wildtype, but not 
mutant, Pinl reversed the defect in replication checkpoint function. 

30 The disclosed NOV34 nucleic acid of the invention encoding a Cyclophilin-type 

peptidyl-prolyl cis-trans isomerase-like protein includes the nucleic acid whose sequence is 
provided in Table 34A or a fragment thereof. The invention also includes a mutant or variant 
nucleic acid any of whose bases may be changed from the corresponding base shown in Table 
34A while still encoding a protein that maintains its Cyclophilin-type peptidyl-prolyl cis-trans 
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isomerase-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
5 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

10 In the mutant or variant nucleic acids, and their complements, up to about 37 percent of the 
bases may be so changed. 

The disclosed NOV34 protein of the invention includes the Cyclophilin-type peptidyl- 
prolyl cis-trans isomerase-like protein whose sequence is provided in Table 34B. The 
invention also includes a mutant or variant protein any of whose residues may be changed 

1 5 from the corresponding residue shown in Table 34B while still encoding a protein that 
maintains its Cyclophilin-type peptidyl-prolyl cis-trans isomerase-like activities and 
physiological functions, or a functional fragment thereof In the mutant or variant protein, up 
to about 24 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

20 (F a b)2, that bind immuno specifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cyclophilin-type 
peptidyl-prolyl cis-trans isomerase-like protein (NOV34) may function as a member of a 
"Cyclophilin-type peptidyl-prolyl cis-trans isomerase family". Therefore, the NOV34 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 

25 implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

30 and cell types composing (but not limited to) those defined here. 

The NOV34 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cyclophilin-type 
peptidyl-prolyl cis-trans isomerase-like protein (NOV34) may be useful in gene therapy, and 
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the Cyclophilin-type peptidyl-prolyl cis-trans isomerase-like protein (NOV34) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from Cardiomyopathy, Atherosclerosis,Hypertension, Congenital heart defects, Aortic stenosis 
5 ,Atrial septal defect (ASD),Atrioventricular (A-V) canal defect, Ductus arteriosus , Pulmonary 
stenosis , Subaortic stenosis, Ventricular septal defect (VSD), valve diseases,Tuberous 
sclerosis, Scleroderma, Obesity,Transplantation, Adrenoleukodystrophy , Congenital Adrenal 
Hyperplasia, Hemophilia, Hypercoagulation,Idiopathic thrombocytopenic purpura , 
Immunodeficiencies,Graft vesus host, Von Hippel-Lindau (VHL) syndrome , Alzheimer's 

10 disease, Stroke, Tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, Multiple sclerosis,Ataxia- 
telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, Anxiety, Pain, 
Neuroprotection, or other pathologies or conditions. The NOV34 nucleic acid encoding the 
Cyclophilin-type peptidyl-prolyl cis-trans isomerase-like protein of the invention, or fragments 

15 thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV34 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV34 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

20 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV34 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

25 disorders. 



NOV35 

A disclosed NOV35 nucleic acid of 1647 nucleotides (also referred to as CG57670-01) 
encoding a pyruvate kinase-like protein is shown in Table 3 5 A. The start and stop codons are 
30 in bold letters. 
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Table 35A. NOV35 nucleotide sequence (SEQ ID NO:79). 

GACCTCAGAAGCCATGTTGAAGCCCCATAGTGAAGCCAGGGCTGCCTTCATTCAGACCCAGCAGCTGCAC 
GCAGCCATGGCTGACACATTCTTGGAGCACATGTGCTGCCTGGACACTGACTCGCCACCCATCACAGCCT 
GGAGCACTGGCATCATCTGTACTATGGGCCCAGCTTCTCCATTGCTAGAGATGCTGAAGAAAACGATTAA 
GTCTGGAATTAATGTGGCTCATCTGAACTCTCATGGAGCCCATGAGTACCATACAGAGACCATCAAGAAC 
GTGCGCACAGCCACGGAAAGCTTTGCTTCTGACTCCATCCTCTACCAGCCCATTGCTGTGGCTCCAGACA 
C TAAAGGAC CTGAGATCC CAACTGGG CC CG T CAAGGG CAG CGG C A C TG CAGAGG TGGAG C TGAAGAAGGG 
AGCCACTCTCAAGTTCACGCTGGATAATACCTACATGGAAAAGGGTAAAGAGAACATCCTGTGGCGGGAC 
TACAAGAACATCTGCAAGGTGGTGGAAGTGGGCTGCAAGATCTACGTGGATGATGGGCTAATTTCTCTCC 
AAGTGAAGCAGAAGGATG CTC^CTTTCTGGTGACAGAGGTGGAAAATGGTGGCT C CTTGGG CAG CAAGAA 
GAGTGTGAACCTTCCTGGGGCTGCCGTGGACCTGTCTGCCATGTTGGAGAAGGACATCCAGGACCTGAAG 
TT TGGGGG CGAG C AAGATGTCGATATGATGTTTT CAT CATT C ATCTG C AAGAC ATCTG ATGT C CATGAAG 
T T AGG AAGG TCT TGGG AG AG AAAGG AAAGAACAGC AAGAT AACC AGC AAAAT TGAGAAT CATG ATGGGGG 
TTGGAGGTTTGATGAAATCCTGGAGGCCAGCGATGGGATTATGGTAGCTCGTGGTGATCCACCACAAGCC 
GTCGAGATGGAGATTCCTGCAGGGAAGGTCTGCCTTGCTCAGAGGATGATGATTGGGTGGTGCAACCAAG 
CTGGGAAGCCTGTCATCTTTGCCACTCAGATGCTAGAGGATGTGATCAAGAAGCTCCACCCCACTTGGGC 
TGAGGGCAGTGGTGTGGCCAATGCAGTTCTGGTGGAAGCTGACTGCATCATGCTGTCTGGAGAAACAGCC 
AAAGGGAACTATCCTCTGGAGGCTGTGCACATGCAGCACCTGATTGCCTGTGAGGCAGAGGCCACCATCT 
ACCACTTGCAATTATTTGAGGAGTTCTGCCACCTGGCACCCATTACCAGTGACCCCGCAGAAGCTACTGC 
CATGGGCACTGTGGAGGCCTCCTTCAAGTGCTGCAGTGGGGCCATAATCGTCCTCACCAAGTCTGCCAGG 
TGTGCCCACCAGGTGGCCAGATACTGCCCACGTGCCCCCATGATTGTTGTGACATGGCATCCCCAGGCAG 
CTCGCCAGGCCCACCTGTACCGTGGTATCTTCCCTGTGCTGTGTAAGGACCCCATCCAGGAGCCCCAGGC 
TGAGGATGTGGACCTCCGAGTGAACTTGGCCATGAATGTTGGTAAGGCCCGAGGCTTCTTCAAGAAGGAT 
GATGTGGTCATTGTGCTGACCTGGGGACACCCTGGCCCTGGCTTCTCCACCACCCTGTGTGTTATTCCTG 
TGCTGTGATGGACTCCAGAGCTCTTCCTCCAGCCCCT 



In a search of public sequence databases, the NOV35 nucleic acid sequence, located on 
chromsome 6 has 751 of 894 bases (84%) identical to a gb:GENBANK- 
ID:MMM2PK|acc:X97047.1 mRNA from Mus musculus (M.musculus mRNA for M2-type 
pyruvate kinase). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV35 polypeptide (SEQ ID NO:80) encoded by SEQ ID NO:79 has 
534 amino acid residues and is presented in Table 35B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV35 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.4500. 



Table 35B. Encoded NOV35 protein sequence (SEQ ID NO:78). 

MLKPHS EARAAF I QTQQLHAAMADT FLEHMC C LDTDS P P I TAWSTG 1 1 CTMGPAS PLLEM 
LKKTIKSGINVAHLNSHGAHEYHTETIKNVRTATESFASDSILYQPIAVAPDTKGPEIPT 
GPVKGSGTAEVELKKGATLKFTLDNTYMEKGKENILWRDYKNICKWEVGCKIYVDDGLI 
SLQVKQKDAHFLVTEVENGGSLGSKKSVNLPGAAVDLSAMLEKDIQDLKFGGEQDVDMMF 
SSFICKTSDVHEVRKVLGEKGKNSKITSKIENHDGGWRFDEILEASDGIMVARGDPPQAV 
EME I PAGKVCLAQRMM I G WCNQAGKPVI F ATQMLED V I KKLHP T WAEG S G VANA VL VE AD 
CIMLSGETAKGNYPLEAVHMQHLIACEAEATIYHLQLFEEFCHLAPITSDPAEATAMGTV 
EASFKCCSGAIIVLTKSARCAHQVARYCPRAPMIWTWHPQAARQAHLYRGIFPVLCKDP 
IQEPQAEDVDLRVNLAMNVGKARGFFKKDDWIVLTWGHPGPGFSTTLCVIPVL 



A search of sequence databases reveals that the NOV35 amino acid sequence has 
433/533 (81%) identity and 458/533 (85%) similarity with pir-id:S30038 pyruvate kinase (EC 



205 



~s f -& ips r^^^" L: «ai iru --a "C3j as?, in* ^ 

2.7.1 .40), muscle splice form M2 - human. Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV35 is expressed in at least Adrenal gland, Aorta, B-cells, Blood, Bone, Brain, 
Breast, CNS, Colon, Ear, Esophagus, Eye, Gall bladder, Germ Cell, Head and neck, Heart, 
5 Kidney, Larynx, Liver, Lung, Lymph, Marrow, Muscle, Neural, Omentum, Ovary, Pancreas, 
Parathyroid, Peripheral nervous system, Placenta, Pooled, Prostate, Skin, Small intestine, 
Spleen, Stomach, Synovial membrane, Testis, Tissue culture, Tonsil, Uterus, Whole embryo, 
and adrenal gland. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
10 Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV35 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 35C. 



Table 35C. BLAST results for NOV35 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l4750405|ref |XP 
037768. l| 
(XM_037768) 


similar to 
pyruvate kinase, 
muscle (H. 
sapiens) 


531 


433/534 
(81%) 


458/534 
(85%) 


0.0 


gi | 4505839 | ref | NP 0 
02645. l| (NM 002654 


pyruvate kinase, 
muscle; Pyruvate 
kinase - 3 ; Thyroid 
hormone -binding 
protein, 
cytosolic (p58) 
[Homo sapiens] 


531 


432/534 
(80%) 


458/534 
(84%) 


0 . 0 


qi ( 107554 |pir | |A339 
83 


pyruvate kinase 
(EC 2.7.1.40) 
isozyme M2 - 
human 


531 


431/534 
(80%) 


457/534 
(84%) 


0.0 


gi|266427|sp|P14618 
|KPY1 HUMAN 


Pyruvate kinase, 
Ml isozyme 

(Pyruvate kinase 
muscle isozyme) 

(Cytosolic 
thyroid hormone - 
binding protein) 

(CTHBP) (THBP1) 


531 


433/534 
(81%) 


458/534 
(85%) 


0.0 


gi | 152154 3 0 | gb | AAH1 
2811 . 1 (AAH12811 
(BC012811 


(protein for 
MGC:3932) [Homo 
sapiens} 


531 


431/534 
(80%) 


457/534 
(84%) 


0.0 



15 Tables 35D-E list the domain descriptions from DOMAIN analysis results against 

NOV35. This indicates that the NOV35 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 35D. Domain Analysis of NOV35 

gnl 1 Pf am |pf amQ2 887 , PK_C f Pyruvate kinase, alpha/beta domain 

CD-Length = 116 residues, 99.1% aligned 

Score a 129 bits (324), Expect = 4e-31 



Table 35E. Domain Analysis of NOV35 



gnl | Pfam|pf am00224 , PK, Pyruvate kinase, barrel domain. This domain of 
the is actually a small beta-barrel domain nested within a larger TIM 
barrel. The active site is found in a cleft between the two domains. 

CD-Length = 349 residues, 100.0% aligned 

Score = 422 bits (1084) , Expect = 3e-119 



Pyruvate kinase is also known as ATPrpyruvate phosphotransferase (EC 2.7.1.40). At 
least 3 molecular forms with pyruvate kinase activity are known (Bigley et al., 1968). The 
5 form that is deficient in a type of hemolytic anemia is the red cell variety, PK1. PK2 is found 
in kidney. PK3 is found in leukocytes, muscle, platelets, and brain but not in red cells or 
kidney. PK1 is found also in liver. A patient with red cell PK deficiency has been found to 
have abnormal liver enzyme also (Bunn, 1981); see Nakashima et al. (1977). During fetal 
development, PK3 changes to PK1 in the liver. PK1 is a tetramer composed of two dissimilar 

1 0 polypeptides of somewhat different molecular weight. It is an allosteric enzyme exhibiting 

cooperative binding for phosphoenolpyruvate and sensitivity to fructose- 1,6-diphosphate. PK3 
also is a tetrameric protein but, unlike PK1, all subunits are alike and, not unexpectedly, there 
is no cooperative behavior. The enzyme is insensitive to fructose- 1,6-diphosphate. Patients 
with deficiency of red cell PK have normal PK2 and PK3. Tsutsumi et al. (1988) showed that 

1 5 pyruvate kinase occurs in 4 isozymic forms (L, R, Ml , M2) and that these are encoded by 2 
different genes, PKL and PKM. The L and R isozymes are generated from the PKL gene by 
differential splicing of RNA; the Ml and M2 forms are produced from the PKM gene by 
differential splicing. Studies of somatic cell hybrids showed that the PK3 and MPI loci are 
syntenic (Shows, 1972). By cell hybridization studies, Van Heyningen et al. (1975) found that 

20 the MPI and PK3 loci are on chromosome 15. Chern et al. (1977) narrowed the assignment to 
15q22-qter. Tani et aL (1988) isolated and sequenced 2 overlapping clones covering the entire 
coding sequence of PKM2. By in situ hybridization they demonstrated that the gene is located 
at band 15q22. Northern blot analysis with RNA from a human hepatoma demonstrated that 
the M2-type PK was predominantly expressed in hepatoma cells, whereas L-type PK was 

25 preferentially expressed in the nontumor portion of the liver. Takenaka et al. (1991) reported 

that the gene that encodes both the Ml and the M2 isozymes is approximately 32 kb long and 
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comprises 12 exons and 1 1 introns. Exons 9 and 10 contain sequences specific for the Ml and 
M2 types, respectively, indicating that the human fetal and adult isozymes are produced from 
the same gene by alternative splicing. 

The disclosed NOV35 nucleic acid of the invention encoding a pyruvate kinase-like 
protein includes the nucleic acid whose sequence is provided in Table 35 A or a fragment 
thereof The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 35A while still encoding a protein 
that maintains its pyruvate kinase-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 6 percent of the bases may be so changed. 

The disclosed NOV35 protein of the invention includes the pyruvate kinase-like 
protein whose sequence is provided in Table 35B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 35B while still encoding a protein that maintains its pyruvate kinase-like activities 
and physiological functions, or a functional fragment thereof In the mutant or variant protein, 
up to about 1 9 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this pyruvate kinase-like 
protein (NOV35) may function as a member of a "pyruvate kinase family". Therefore, the 
NOV35 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
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delivery/gene ablation), research tools, tissue regeneration in vivo and //? vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV35 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the pyruvate kinase-like 
protein (NOV35) may be useful in gene therapy, and the pyruvate kinase-like protein 
(NOV35) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from Cardiomyopathy, Atherosclerosis,Hypertension, 
Congenital heart defects, Aortic stenosis ,Atrial septal defect (ASD),Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases,Tuberous sclerosis, Scleroderma, Obesity,Transplantation, 
Adrenoleukodystrophy , Congenital Adrenal Hyperplasia, Hemophilia, 
Hypercoagulationjdiopathic thrombocytopenic purpura , Immunodeficiencies,Graft vesus 
host, Von Hippel-Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy,Lesch- 
Nyhan syndrome, Multiple sclerosis,Ataxia-telangiectasia,Leukodystrophies,Behavioral 
disorders, Addiction, Anxiety, Pain, Neuroprotection, or other pathologies or conditions. The 
NOV nucleic acid encoding the pyruvate kinase-like protein of the invention, or fragments 
thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV35 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV35 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV35 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV36 

NOV36 includes two Cis/Trans Peptidyl Prolyl Isomerase-like proteins disclosed 
below. The disclosed sequences have been named NOV36a and NOV36b. 
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NOV36a 



A disclosed NOV36a nucleic acid of 600 nucleotides (also referred to as CG57149- 
01) encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 3 6 A. The 
start and stop codons are in bold letters. 



Table 36A. NOV36a nucleotide sequence (SEQ ID NO:81). 

ACCAGGAGCCCTGTACTACCAGCCATGGTCAACCCCACCATGTTCTTCAACATCGCCATCAACAGCGAGG 
CCTTGGGGCACGTCTCCTTCGAACTGTTTGCAGACAAGTTTCCAAAGACAGAAAACTTTCGTGCTCTGAG 
CACTGGAGAGAAAGGATTTGGTTATAAGGGTTCCTGCTTTCACAGAATTATTCTAGGGCTTTTGTGTCAG 
GGTGGTGACTTTACATGCCATAATGGCACTGGTGGCAAGTCTGTCTACAGGGAGAAATTTGATGATGAGA 
ACTTCATTCTGAAGCATACAGGTCCTGGCATCTTGTCCATGAAGCATACAGGTCCTGGCATCTTGTCCAT 
GGCAAATGCTGGACCCAACACAAACGATTCCCAGATTTTCATCTGCACTGCCAAGACCGAGTGGTTGGAT 
GG CAAGCATGTGGTCTCTGGGAGGGTGAAAGAAGGCATCAAGATTGTGGAGGCCATGAAGCGCTATGGGT 
CCAAGAATGGCAAGAGCAGGAAGAAGATCACCACTGCTGACTGTGGACAACTCTAATAAGTTTGACTTGT 
GTTTTATCTTAACCACCAGACCATTCCTTTTGTAGCTCAG 



In a search of public sequence databases, the NOV36a nucleic acid sequence, located 
on chromsome 10 has 288 of 327 bases (88%) identical to a gb:GENBANK- 
ID:HSCYCR|acc:Y00052.1 mRNA from Homo sapiens (Human mRNA for T-cell 
cyclophilin). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV36a polypeptide (SEQ ID NO:82) encoded by SEQ ID NO:81 has 
173 amino acid residues and is presented in Table 36B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV36a has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 36B. Encoded NOV36a protein sequence (SEQ ID NO:82). 

MVNPTMFFNIAINSEALGHVSFELFADKFPKTENFRALSTGEKGFGYKGSCFHRIILGLL 
CQGGDFTCHNGTGGKS VYREKFDDENF I LKHTG PG I L SMKHTGPG I LSMANAGPNTND SQ 
I F I CTAKTE WLDGKHWS GRVKEG I K I VEAMKRYGS KNGKS RKKI TTADCGQL 



A search of sequence databases reveals that the NOV36a amino acid sequence has 136 
of 173 amino acid residues (78%) identical to, and 149 of 173 amino acid residues (86%) 
similar to, the 165 amino acid residue ptnr;pir-id:CSHUA protein from human (peptidylprolyl 
isomerase (EC 5.2.1.8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 
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NOV36b 



A disclosed NOV36b nucleic acid of 566 nucleotides (also referred to as CG57149- 
02) encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 36A. The 
start and stop codons are in bold letters. 



Table 36C. NOV36b nucleotide sequence (SEQ ID NO:83). 



GTACTACCAGCCA TGGTCAACCCCACCATGTTCTTCAACA.TCGCCATCAACAGCGAGGCC 
TTGGGGCACGTCTCCTTCGAACTGTTTGCAGACAAGTTTCCAAAGACAGAAAACTTTCGT 
GCTCTGAGCACTGGAGAGAAAGGATTTGGTTATAAGGGTTCCTGCTTTCACAGAATTATT 
CTAGGGCTTTTGTGTCAGGGTGGTGACTTTACATGCCATAATGGCACTGGTGGCAAGTCT 
GTCTACAGGGAGAAATTTGATGATGAGAACTTCATTCTGAAGCATACAGGTCCTGGCATC 
TTGTCCATGAAGCATACAGGTCCTGGCATCTTGTCCATGGCAAATGCTGGACCCAACACA 
AACGATTCCCAGATTTTCATCTGCACTGCCAAGACCGAGTGGTTGGATGGCAAGCATGTG 
GTCTCTGGCAGGGTGAAAGAAGGCATCAAGATTGTGGAGGCCATGAAGCGCTATGGGTCC 
AAGAATGGCAAGAGCAGGAAGAAGATCACCACTGCTGACTGTGGACAACTCTAA TAAGTT 
TGACTTGTGTTTTATCTTAACCACCA 



In a search of public sequence databases, the NOV36b nucleic acid sequence has 269 
of 31 1 bases (86%) identical to a gb:GENBANK-ID:AF139893|acc:AF139893.1 mRNA from 
Oryctolagus cuniculus (Oryctolagus cuniculus cyclophilin 1 8 mRNA, complete cds). Public 
nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV36b polypeptide (SEQ ID NO:84) encoded by SEQ ID NO:83 has 
173 amino acid residues and is presented in Table 36B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV36b has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.4500. 



Table 36D. Encoded NOV36b protein sequence (SEQ ID NO:84). 



MVNPTMFFNIAINSEALGHVSFELFADKFPKTENFRALSTGEKGFGYKGSCFHRIILGLL 
CQGGDFTCHNGTGGKSVYREKFDDENFILKHTGPGILSMKHTGPGILSMANAGPNTNDSQ 
IFICTAKTEWLDGKHWSGRVKEGIKIVEAMKRYGSKNGKSRKKITTADCGQL 



A search of sequence databases reveals that the NOV36b amino acid sequence has 136 
of 173 amino acid residues (78%) identical to, and 149 of 173 amino acid residues (86%) 
similar to, the 165 amino acid residue ptnr:TREMBLNEW-ACC:AAH00689 protein from 
Homo sapiens (Human) (PEPTIDYLPROLYL ISOMERASE A (CYCLOPHILIN A)). Public 
amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV36b is expressed in at least Epidermis, Lymphoid tissue. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 
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The disclosed NOV36a polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 36E. 



Table 36E. BLAST results for NOV36a 


Gene Index/ 
I dent i f ier 


Protein/ Organism 


Length 
\aa) 


Identity 


Positives 


Expect 


giil2804335|gb|AAH0 

3026.l|AAH03026 

(BC003026) 


(protein for 
IMAGE:2823490) 
[Homo sapiens] 


174 


136/174 
(78%) 


149/174 
(85%) 


2e-70 


gi|4033689|sp]P0437 
4 | CYPH BOVIN 


PEPTIDYL- PROLYL 
CIS -TRANS 
ISOMERASE A 

(PPIASE) 

(ROTAMASE) 

(CYCLOPHILIN A) 

(CYCLOSPORIN A- 
BINDING PROTEIN) 


164 


136/174 
(78%) 


149/174 
(85%) 


8e-70 


gi| 10863927 |ref |NP 
066953. 1| 
(NM__021130 


peptidylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 


165 


136/174 
(78%) 


149/174 
(85%) 


le-69 


gi| 6840l|pir | | CSBOA 
B 


peptidylprolyl 
isomerase (EC 
5.2.1.8) A - 
bovine 


163 


135/173 
(78%) 


148/173 
(85%) 


4e-69 


gi | 1354 3 666 | gb | AAHO 

5982.l|AAH05982 

(BC005982) 


pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 


165 


136/174 
(78%) 


149/174 
(85%) 


6e-69 



5 Table 36F lists the domain descriptions from DOMAIN analysis results against 

NOV36a. This indicates that the NOV 36a sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 36F. Domain Analysis of NOV36 

gnl 1 PfamlpfamOOieo, pro_isomerase, Cyclophilin type pept idyl -prolyl 
cis-trans isomerase 

CD-Length = 162 residues, 100.0% aligned 

Score =■ 196 bits (497) , Expect = le-51 



10 The human parvulin Pinl is a member of the peptidyl-prolyl cis-trans isomerase group 

of proteins, which modulate the assembly, folding, activity, and transport of essential cellular 
proteins. Pinl is a mitotic regulator interacting with a range of proteins that are phosphorylated 
before cell division. In addition, an involvement of Pinl in the tau-related neurodegenerative 
brain disorders has recently been shown. In this context, Pinl becomes depleted from the 

15 nucleus in Alzheimer's disease (AD) neurons when it is redirected to the large amounts of 
hyperphosphorylated tau associated with the neurofibrillary tangles. This depletion from the 
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nucleus may ultimately contribute to neuron cell death. The 1 3 1 -amino acid residue parvulin- 
like human peptidyl-prolyl cis/trans isomerase (PPIase) hParl4 was shown to exhibit sequence 
similarity to the regulator enzyme for cell cycle transitions human hPinl, but specificity for 
catalyzing pSer(Thr)-Pro cis/trans isomerizations was lacking. That FK and CsA completely 
5 inhibit immune function without completely inhibiting CN suggests that the inhibition of 

immune function is not mediated by general CN inhibition but by inhibition of a subset of CN 
which is critical for lymphocyte activation. 

The disclosed NOV36b nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 

10 36A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 36A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 

1 5 acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 

20 chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 14 percent of the bases may be so 
changed. 

The disclosed NOV36b protein of the invention includes the Cis/Trans Peptidyl Prolyl 
25 Isomerase-like protein whose sequence is provided in Table 36B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 36B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 22 percent of the residues may be so changed. 
30 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(FabH that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV36) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV36 nucleic acids and proteins identified here 
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may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
5 prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 

regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV36 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

10 and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV36) may be useful in gene therapy, and the Cis/Trans 
Peptidyl Prolyl Isomerase-like protein (NOV36) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 

15 including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 
and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 
osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 

20 insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
or conditions. The NOV36 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV36 nucleic acids and polypeptides are further useful in the generation of 
25 antibodies that bind immuno-specifically to the novel NOV substances for use in therapeutic 
or diagnostic methods. These antibodies may be generated according to methods known in the 
art, using prediction from hydrophobicity charts, as described in the "Anti-NOVX Antibodies" 
section below. The disclosed NOV36 proteins have multiple hydrophilic regions, each of 
which can be used as an immunogen. These novel proteins can be used in assay systems for 
30 functional analysis of various human disorders, which will help in understanding of pathology 
of the disease and development of new drug targets for various disorders. 
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NOV37 

A disclosed NOV37 nucleic acid of 660 nucleotides (also referred to as CG57151-01) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 3 7 A. The start 
and stop codons are in bold letters. 



Table 37A. NOV37 nucleotide sequence (SEQ ID NO:85). 



ACTAGTCATTCTTCCCAGTAGCTAATGAAGCTGACTTTTAAAAAGAAGGCTGTGAGCTTTGCAGATGCTG 
CTGCCGCCCAGGGCCCCCTGCTTCCAGCCATGGTCAACCCCACCATGTTTTTCCACATTGCTGTCGATGG 
CGAGCCCTTGGGCTGTGTCTCCTTCGAGGTAGAGCTGTTTGCAGACAAGGTTCCAAAGACAGCAGAAAAT 
TTCCATGCTCTGAGCACTGGAGAAAAAGGATTTGGTTATAAGGGTTCCTGCTTTCACAGAATTATTCCAG 
GGTTTACGTGTCAGAGTGGTGACTTCACACGCCATGGTGGCAAGTCCATCTGCAGGGAGAAATTTGATGA 
CAAGAACTTCATCCTGAAGCATACGGGTCCTGGCATCTTGTCCATGGCAAATGCTGGACCCAGCGTGAAC 
GTTTCCCAGTTTTTTATCTGCCCTGCCAAGACAGAGTGGTTGGATTGCAAGCATGTGGTCTTTGGCAAGG 
TGAAAGATGGCATGAATATTGTGGAGGTCATGGAGCACTTGGGGTCCAAGAATGGCAAGATCAGCAAGAA 
GAT CAC CATTGCTGACTGGACAAC TG C AAT AAATT TGACGGGTGT TTCTCTT AAAAAAAAAAAAAAAAT A 
CTGTGACAGAC CAAGGTAAAT TGTT T TTGA 



10 



15 



20 



In a search of public sequence databases, the NOV37 nucleic acid sequence, located on 
chromsome 10 has 492 of 581 bases (85%) identical to a (HSCPH192|acc: X52857.1) cyclophilin- 
related processed pseudogene mRNA from human. Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV37 polypeptide (SEQ ID NO:86) encoded by SEQ ID NO:85 has 
203 amino acid residues and is presented in Table 37B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV37 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.6400. 



Table 37B. Encoded NOV37 protein sequence (SEQ ID NO:86). 



MKLTFKKKAVSFADAAAAQGPLLPAMVNPTMFFHIAVDGEPLGCVSFEVELFADKVPKTA 
ENFHALSTGEKGFGYKGSCFHRIIPGFTCQSGDFTRHGGKSICREKFDDKNFILKHTGPG 
ILSMANAGPSVNVSQFFICPAKTEWLDCKHW 
T I ADWTTAINLTGVS LKKKKKI L 



A search of sequence databases reveals that the NOV37 amino acid sequence has 136 
of 160 amino acid residues (85%) identical to, and 142 of 160 amino acid residues (88%) 
similar to, the 165 amino acid residue ptnr:pir-id:CSHUA protein from human (peptidylprolyl 
isomerase (EC 5.2.1.8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV37 is expressed in at least Bone Marrow, Brain, Cartilage, Cochlea, Colon, 
Epidermis, Kidney, Lung, Mammary gland/Breast, Ovary, Pancreas, Prostate, Stomach, 
Testis, Thymus, Umbilical Vein, Uterus, Vulva, Whole Organism. Expression information 



215 



*® n tru us* as?- *«• ir*. rsn ^ai %z?~ ssa 

,^1,^ -Ti^P U«a> ~-UJ> ^^^^L* 52*iil -is ^Uu^ 



was derived from the tissue sources of the sequences that were included in the derivation of 
the sequence of CG57151_01.The sequence is predicted to be expressed in the following 
tissues because of the expression pattern of (GENBANK-ID: pir-id:CSHUA peptidylprolyl 
isomerase) a closely related peptidylprolyl isomerase homolog in species human: Kidney, 
Lung. This information was derived by determining the tissue sources of the sequences that 
were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, Literature sources, and/or RACE sources. 

The disclosed NOV37 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 37C. 



Table 37C. BLAST results for NOV37 



Gene Index/ 
Identifier 



Protein/ Organism 



Length 
(aa) 



Identity 
(%) 



Positives 
(%) 



Expect 



gi | 12 8 043 35 | gbjAAHO 
3026 .1 [AAH03026 
(BC003026) 



protein for 

IMAGE:2823490) [Homo 
sapiens] 



174 



139/171 
(81%) 



147/171 
(85%) 



E3-71 



gi|40336 89|splP0437 
4 j CYPH BOVIN 



PEPTIDYL-PROLYL 
CIS-TRANS 
ISOMERASE A 
(PPIASE) 
(ROTAMASE) 
(CYCLOPHILIN A) 
(CYCLOSPORIN A- 
BINDING PROTEIN) 



164 



137/162 
(84%) 



142/162 
(87%) 



4e-70 



gi 1 10863927 1 ref | NP 
066953 . 1 1 
<NM 021130) 



peptidylprolyl 
isomerase A 
(cyclophilin A) [Homo 

sapiens] 



165 



136/162 
(83%) 



142/162 
(86%) 



le-69 



gi | 68401 |pir 1 | CSBOA 
B 



peptidylprolyl 
isomerase (EC 
5.2.1.8) A - 
bovine 



163 



136/161 
(84%) 



141/161 
(87%) 



2e-69 



gi|l393798llgb|AAH0 
7104 . 1 [A AH07104 
(BC007104 



pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 



165 



136/162 
(83%) 



141/162 
(86%) 



2e-69 



Table 37D lists the domain descriptions from DOMAIN analysis results against 
NOV37. This indicates that the NOV37 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 37D. Domain Analysis of NOV37 

gnl 1 Pf am| pf am00160 , pro_isomerase, Cyclophilin type pept idyl -prolyl 
cis-trans isomerase 

CD-Length = 162 residues, 97.5% aligned 

Score =191 bits (484), Expect = 5e-50 



The disclosed NOV37 nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
37A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 37A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 1 5 percent of the bases may be so 
changed. 

The disclosed NOV37 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 37B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 37B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 15 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV37) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV37 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
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pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
5 regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV37 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 

10 Prolyl Isomerase-like protein (NOV37) may be useful in gene therapy, and the Cis/Trans 

Peptidyl Prolyl Isomerase-like protein (NOV37) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 
including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 

15 and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 

osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 
insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 

20 or conditions. The NOV37 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV37 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV37 substances for use in 

25 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV37 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

30 understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV38 

A disclosed NOV38 nucleic acid of 600 nucleotides (also referred to as CG57153-01) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 38A. The start 
and stop codons are in bold letters. 



Table 38A. NOV38 nucleotide sequence (SEQ ID NO:87). 



AATTTATTGGTTTGTTTGTTTAAAATATTTGTTGGCACTTTGCAGATGCCACTTCCACTGATGTCACCAC 
TGCCAGTGATGGTCACCTCCACCTTGTTCTTTAACTTTGTAGTCAACGGTGAGCACTTGGGCCATGTCTC 
CTT C CAG CTG TTTG CAAAGAAAG T TC CAAAGACAG CAGAAAATGTT CAT T TTG TGAG CAC TGGAG AG AAA 
GGATTTGGCTATAAGTGTTCCTGTTTTCACAGAATTATTCCAGGGTTTATATGCCAGAGTGGTGACTTCA 
CATGTCATG ATGACACTGG CACAAAGT CCAACT AC TGGGAGAAG T C TGATGATGATAAC T C CAT C CTGAA 
GCATACAAGACCTGGCACCTTGTCCATGGCAAATACTGGACGCTACACAAATGGTTTCCAGTTTTTCATC 
TGCACTGCCAAAACTGTGTGGTTGGGTGGCAAGAGTGCAGTCTTTGGCAAGACAAAAGAGGGCTTGAATA 
TCTTGGAAGC(^TGGCGC^CTTTGCTTTCTGGAATGGCAAAACCAGAAAGAAGACCACGATTGACAACTG 
TGGACAACTCCAATAAATTTAACTTATGTTTTGTTTTAAC 



In a search of public sequence databases, the NOV38 nucleic acid sequence, located on 
chromsome 16 has 456 of 564 bases (80%) identical to a gbrGENBANK- 
ID:AK026569|acc:AK026569.1 mRNA from Homo sapiens (Homo sapiens cDNA: FLJ22916 
10 fis, clone KAT06406, highly similar to HSCYCR Human mRNA for T-cell cyclophilin. 

Public nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV38 polypeptide (SEQ ID NO:88) encoded by SEQ ID NO:87 has 
176 amino acid residues and is presented in Table 38B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV38 has no signal peptide and is 
1 5 likely to be localized in the cytoplasm with a certainty of 0.5500. 



Table 38B. Encoded NOV38 protein sequence (SEQ ID NO:88). 



MPLPLMSPLPVMVTSTLFFNFWNGEHLGHVSFQLFAKKVPKTAENVHFVSTGEKGFGYK 
CSCFHR I I PGFI CQSGDFTCHDDTGTKSNYWEKSDDDNS I LKHTRPGTLSMANTGR YTNG 
FQFFICTAKTVWLGGKSAVFGKTKEGLNILEAMAHFAFWNGKTRKKTTIDNCGQLQ 



A search of sequence databases reveals that the NOV38 amino acid sequence has 113 
of 164 amino acid residues (68%) identical to, and 127 of 164 amino acid residues (77%) 
20 similar to, the 164 amino acid residue ptnr:SPTREMBL-ACC:Q9TTC6 protein from 

Oryctolagus cuniculus (Rabbit) (CYCLOPHILIN 18). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV38 is expressed in at least Bone Marrow, Brain, Cartilage, Cochlea, Colon, 
Epidermis, Kidney, Lung, Mammary gland/Breast, Ovary, Pancreas, Prostate, Stomach, 
25 Testis, Thymus, Umbilical Vein, Uterus, Vulva, and Whole Organism. Expression information 
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was derived from the tissue sources of the sequences that were included in the derivation of 
the sequence of CG57153 OLThe sequence is predicted to be expressed in the following 
tissues because of the expression pattern of (GENBANK-ID: gbrGENBANK- 
ID:AK026569|acc:AK026569.1) a closely related Homo sapiens cDNA: FLJ22916 fis, clone 
KAT06406, highly similar to HSCYCR Human mRNA for T-cell cyclophilin homolog in 
species Homo sapiens. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV38 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 38C. 



Table 38C. BLAST results for NOV38 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 12 80433 5 |gb| AAHO 
3026 . 1 | AAH03026 
(BC003026) 


(protein for 
IMAGE: 2823490) 
[Homo sapiens] 


174 


115/168 
(68%) 


128/168 
(75%) 


6e-58 


gi | 13 93 7981 | gb | AAHO 
7104 . 1 | AAH07104 
(BC007104) 


pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 


165 


114/165 
(69%) 


127/165 
(76%) 


2e-57 


gi|l0863927|ref | NP 
066953 . 1 | 
(NM_021130) 


pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 


165 ' 


114/165 
(69%) 


127/165 
(76%) 


4e-57 


gi|4033689|sp|P0437 
4|CYPH BOVIN 


PEPTIDYL- PROLYL 
CIS -TRANS 
ISOMERASE A 

(PPIASE) 

(ROTAMASE) 

(CYCLOPHILIN A) 

(CYCLOSPORIN A- 
BINDING PROTEIN) 


154 


114/164 
(69%) 


126/164 
(76%) 


4e-57 


gi | 6651171 | gb | AAF2 2 
215 .1 [AF139893 1 
(AF139893) 1 


cyclophilin 18 
[Oryctolagus 
cuniculus] 


154 


113/164 
(68%) 


127/164 
(76%) 


7e-57 



Table 38D lists the domain descriptions from DOMAIN analysis results against 
NOV38. This indicates that the NOV38 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 38D. Domain Analysis of NOV 38 

gni | Pf am|pf aro 00160, pro_isomerase, Cyclophilin type pept idyl -prolyl 
cis- trans isomerase. 

CD-Length = 162 residues, 100.0% aligned 

Score = 171 bits (433), Expect = 3e-44 



The disclosed NOV38 nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
38A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
5 whose bases may be changed from the corresponding base shown in Table 38 A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 

1 0 invention additionally includes nucleic acids or nucleic acid fragments, or complements 

thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 

15 antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 

variant nucleic acids, and their complements, up to about 20 percent of the bases may be so 
changed. 

The disclosed NOV38 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 38B. The invention also includes 
20 a mutant or variant protein any of whose residues may be changed from the corresponding 

residue shown in Table 38B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 32 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
25 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV38) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV38 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
30 pathologies and disorders as indicated below. The potential therapeutic applications for this 
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invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
5 those defined here. 

The NOV38 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV38) may be useful in gene therapy, and the Cis/Trans 

10 Peptidyl Prolyl Isomerase-like protein (NOV38) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 
including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 
and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 

15 osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 
insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
or conditions. The NOV38 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 

20 protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV38 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV38 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

25 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV38 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

30 disorders. 

NOV39 

A disclosed NOV39 nucleic acid of 600 nucleotides (also referred to as CG57 155-01) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 39A. 
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Table 39A. NOV39 nucleotide sequence (SEQ ID NO:89). 

CTGAAAACTTTCTTGTAAGACTTTGACCTGCATTATATGATTCTCCTTAATCTTCACAGCATGATTTCGT 
GTTTTTGGACATTTCTATACTGTGATGTGTGTCCCAAAACATGTAAAAATTTTCAGGTCTTGTGCACAGG 
AAAAGCAGGGTTTTCTCAACGTGGCATAAGACTACATTACAAAAATTCCATTTTTCATCGAATAGTACAG 
AATGG CTGG AT ACAAGGAGGGG ATATAG TG TATGGAAAAGGAGATAATGGAG AG T CGATTTATGG T C CAA 
CATTTGAAGATGAAAACTTTT CAGTTCCTCATAATAAAAGAGGAGTACTTGGAATGGCCAACAAAGG CCG 
TCACAGCAACGGGTCACAATTCTATATCACACTGCAAGCAACTCCTTATCTAGATAGAAAATTTGTGGCT 
T TTGGG TATGTATAT TGTAGAT CTAT TTATATAATATT CACACCTGG TAG TAAAAAAG C C CAG AG AAGTA 
TGTG CAAGAAAC TAACAG TATG TGGTTG TGGG CGTAGT T T TT CAAAGGAAGAAGTAGT CAAATGCTG TAA 
CAAGGACAACTCATCTTGAAACACTTACGCAGTGGTGTGT 



In a search of public sequence databases, the NOV39 nucleic acid sequence, located on 
chromsome 6 has 257 of 397 bases (64%) identical to a gb:GENBANK- 
ID:AF043642|acc:AF043642.1 mRNA from Rattus norvegicus (Rattus norvegicus matrin 
cyclophilin (matrin-cyp) mRNA, complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV39 polypeptide (SEQ ID NO:90) encoded by SEQ ID NO:89 has 
180 amino acid residues and is presented in Table 39B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV39 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.5128. The most likely cleavage site 
for a NOV39 peptide is between amino acids 19 and 20. 

Table 39B. Encoded NOV39 protein sequence (SEQ ID NO:90). 

M I LLNLHSM I S CFWTFLYCDVCPKTCKNFQVLCTGKAGFSQRG I RLHYKNS I FHRI VQNGW I QGGD I VYG 
KGDNGESI YGPTFEDENFSVPHNKRGVLGMANKGRHSNGSQFYITLQATPYLDRKFVAFGYVYCRSIYII 
FTPGS KKAQRSMCKKLT VCGCGRS FS KEE WKCCNKDNS S 



A search of sequence databases reveals that the NOV39 amino acid sequence has 70 of 
87 amino acid residues (80%) identical to, and 80 of 87 amino acid residues (91%) similar to, 
the 278 amino acid residue ptnr:TREMBLNEW-ACC:BAB29003 protein from Mus musculus 
(Mouse) (ADULT MALE HIPPOCAMPUS CDNA, RIKEN FULL-LENGTH ENRICHED 
LIBRARY, CLONE:2900084F20, FULL INSERT SEQUENCE). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV39 is expressed in at least : Kidney, Lung, Lymphoid tissue, Mammary 
gland/Breast, Oviduct/Uterine Tube/Fallopian tube, Testis, Whole Organism. Expression 
information was derived from the tissue sources of the sequences that were included in the 
derivation of the sequence of CG57155_01.The sequence is predicted to be expressed in the 
following tissues because of the expression pattern of (GENBANK-ID: gb:GENBANK- 
ID:AF043642|acc:AF043642.1) a closely related Rattus norvegicus matrin cyclophilin 
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(matrin-cyp) mRNA, complete cds homolog in species Rattus norvegicus: Kidney, Lung, 
Lymphoid tissue. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV39 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table39C. 



Table 39C. BLAST results for NOV39 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 12 8 51324 | dbj | BAB 


similar to NK- 
TUMOR RECOGNITION 
PROTEIN ( NATURAL - 
KILLER CELLS 
CYCLOPHILIN- 
RELATED PROTEIN) 
(NK-TR PROTEIN) 
[Mus musculus] 


278 


70/87 
(80%) 


80/87 
(91%) 


2e-41 


29003 . 1 | (AK013818) 


gi | 684102 8 |gb|AAF2 8 


cyclophilin 
[Schistosoma 
mansoni] 


181 


65/120 
(54%) 


89/120 
(74%) 


3e-35 


867. l| (AF121134) 


gi| 13929124 | ref | NP 


matrin 
cyclophilin 

(matrin-cyp) 

[Rattus 
norvegicus] 


752 


64/117 
(54%) 


85/117 
(71%) 


le-31 


113981. l| 
(NM_031793) 


gi | 6754858 | ref |NP 0 


natural killer 
tumor recognition 
[Mus musculus] 


1482 


63/117 
(53%) 


82/117 
(69%) 


2e-31 


35048 .l| 
(NM 010918) 


gi| 8039799 | sp| P3041 


NK-tumor recognition 
protein (Natural-killer 
cells; cyclophilin- 
related protein) (NK- 
TR protein) 


1453 


63/117 
(53%) 


82/117 
(69%) 


2e-31 


5|NKCR MOUSE 





Table 39D lists the domain descriptions from DOMAIN analysis results against 
10 NOV39. This indicates that the NOV39 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 39E. Domain Analysis of NOV39 

gnl | Pf am[ pfamO 0160 , pro_isomerase , Cyclophilin type pept idyl -prolyl 
cis-trans isomerase 

CD-Length = 162 residues, 67.9% aligned 

Score = 141 bits (356), Expect = 3e-35 



The disclosed NOV39 nucleic acid of the invention encoding a Cis/Trans Peptidyl 

15 Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
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39A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 39 A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 36 percent of the bases may be so 
changed. 

The disclosed NOV39 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 39B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 39B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 20 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV39) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV39 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV39 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
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and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV39) may be useful in gene therapy, and the Cis/Trans 
Peptidyl Prolyl Isomerase-like protein (NOV39) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
5 will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 

including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 
and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 
osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
10 such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 

insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
or conditions. The NOV39 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

1 5 NOV39 nucleic acids and polypeptides are further useful in the generation of 

antibodies that bind immuno-specifically to the novel NOV39 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV39 proteins have multiple hydrophilic 

20 regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV40 

25 A disclosed NOV40 nucleic acid of 572 nucleotides (also referred to as CG57157-01) 

encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 40A. The start 
and stop codons are in bold letters. 



Table 40A. NOV40 nucleotide sequence (SEQ ID NO:91). 

TTTGTAGTCATCGCTGCCACCTGAAGCCACCTGCCTCTAGCCATGGTCAACGCCCCACTGTGTTCTTTTG 
ACATCATTGTTGATGGTAACTCCTTTGGCCCATGCAGCTCCTTCGAGCTGTTTGCCGACAAAGTTCCAAA 
AACAGTGGAAAACTTTCGTGCACTGAGCACTGGAGGAAAAGGATTTGGTTATAAGGGTTCCTGCTTTCAC 
AGAATTATTCCAGGGTTTATTTTATCTGCCAGAGTGCTGACTTCACACACCATAATAATGCCCCAGTCCA 
T CTACCAGGAGAAATT TGATG ATGAGAACT T CAT CTTGAAG CACACAGGTCCTGG CAT C TTGTC C ATGG C 
AAATGCTGGCCCGGACACAAATGGTTCCCAGTTTTTCACCTGTGTGGCCAAGACTGAGTGGCTGGATGGC 
AAGCACAAGGTCTTTGGCAAAGTGAGAAGAGGGGTGAATATCATGGAAGCCATGGAGTGCTCTGGGTCCG 
GGAATGGTGAGACTGGCAAGAAGATCACCACTGCCAACTGCGGACAACTCTAATCAATCTGCTTGTGTTT 
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GATCTTAACCAC 



In a search of public sequence databases, the NOV40 nucleic acid sequence, located on 
chromsome 17 has has 446 of 535 bases (83%) identical to a gb.GENBANK- 
ID:HSCYCR|acc:Y00052.1 mRNA from Homo sapiens (Human mRNA for T-cell 
cyclophilin). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV40 polypeptide (SEQ ID NO:92) encoded by SEQ ID NO:91 has 
166 amino acid residues and is presented in Table 40B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV40 has no signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. 



Table 40B. Encoded NOV40 protein sequence (SEQ ID NO:92). 



MVNAPLCS FD 1 1 VDGNS FGPCS S FELFADKVPKTVENFRALSTGGKGFGYKGS CFHRI I P 
GFILSARVLTSHTIIMPQSIYQEKFDDENFILKHTGPGILSMANAGPDTNGSQFFTCVAK 
TEWLDGKHKVFGKVRRGVNIMEAMECSGSGNGETGKKITTANCGQL 



A search of sequence databases reveals that the NOV40 amino acid sequence has 122 
of 166 amino acid residues (73%) identical to, and 131 of 166 amino acid residues (78%) 
similar to, the 165 amino acid residue ptnr:pir-id:CSHUA protein from human (peptidylprolyl 
isomerase (EC 5.2.1.8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV40 is expressed in at least Brain. Expression information was derived from the 
tissue sources of the sequences that were included in the derivation of the sequence of 
CG57157 _01.The sequence is predicted to be expressed in the following tissues because of the 
expression pattern of (GENBANK-ID: gb:GENBANK-ID:HSCYCR|acc:Y00052.1) a closely 
related Human mRNA for T-cell cyclophilin homolog in species Homo sapiens: Brain. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV40 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 40C . 
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Table 40C. BLAST results for NOV40 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi| 14743515 | ref | XP 
017224 .2 | 
(XM_017224) 


hypothetical 
protein XP__017224 
[Homo sapiens] 


165 


121/166 
(72%) 


130/166 
(77%) 


e-59 


gi | 128 0433 5 | gb | AAHO 
3026 . 1 (AAH03026 
(BC003026) 


(protein for 
IMAGE:2823490) 
[Homo sapiens] 


174 


122/166 
(73%) 


131/166 
(78%) 


e-59 


gi|4033689|sp|P0437 
4|CYPH BOVIN 


PEPTIDYL- PROLYL 
CIS-TRANS 
ISOMERASE A 

(PPIASE) 

(ROTAMASE) 

(CYCLOPHILIN A) 

BINDING PROTEIN) 


164 


122/166 
(73%) 


131/166 
(78%) 


4e-59 


gi | 10863927 | ref |NP 
066953 .1 | 
(NM_021130) 


pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 


165 


122/166 
(73%) 


131/166 
(78%) 


5e-59 


gi|6840l|pir | | CSBOA 
B 


pept idylprolyl 
isomerase (EC 
5.2.1.8) A - 
bovine 


163 


119/162 
(73%) 


128/162 
(78%) 


9e-59 



Table 40D lists the domain descriptions from DOMAIN analysis results against 
NOV40. This indicates that the NOV40 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 40D. Domain Analysis of NOV40 

gnl 1 Pf am 1 pf am00160 , pro_i some rase , Cyclophilin type peptidyl -prolyl 
cis- trans isomerase 

CD-Length = 162 residues, 99.4% aligned 

Score = 185 bits (469) , Expect = 2e-48 



The disclosed NOV40 nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
40A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 40A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
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of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
5 variant nucleic acids, and their complements, up to about 1 7 percent of the bases may be so 
changed. 

The disclosed NOV40 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 40B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 

1 0 residue shown in Table 40B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 27 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

15 The above defined information for this invention suggests that this Cis/Trans Peptidyl 

Prolyl Isomerase-like protein (NOV40) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV40 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 

20 invention include, but are not limited to: protein therapeutic, small molecule drug target, 

antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

25 The NOV40 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV40) may be useful in gene therapy, and the Cis/Trans 
Peptidyl Prolyl Isomerase-like protein (NOV40) may be useful when administered to a subject 

30 in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 
including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 
and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 
osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
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AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 
insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
or conditions. The NOV40 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
5 protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV40 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV40 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

10 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV40 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

15 disorders. 

NOV41 

A disclosed NOV41 nucleic acid of 525 nucleotides (also referred to as CG571 59-01) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase-like protein is shown in Table 41 A. The start 
and stop codons are in bold letters. 

20 

Table 41A. NOV41 nucleotide sequence (SEQ ID NO:93). 

GCCCAGAACTCCCTGCCACCAGCCATGGCCAACCCCACTGTGTTCTTCAACATTGCAATTGATAGTGAGT 
CCTTGGGCTGCATCTCCTTCAAGCTATTTGCAGACAAAGTTCTAAAGATGGAAGAAAATTTTTGTGCTCT 
GAAC ACTGG AGAG AAAGT ATTTGGTGAT AAATGT C C C TGCTT T T AC AGAATT AT T C CGGGGGTGTGT C AG 
GGTGGTGACTTCACACACCATAATGGCACTGGTGGCAAGTCCCTCTACAGCAAGGAATTTGATGATGAGA 
ACTTCATCCTAAAGCATACAGCTCCTGGCGTCTTGTCCACGGCAAATGCTGGACCCACCACAAATGGTTC 
CCAGTTTTTCTTCTGTACTGCCAAGACAGAGGATGGACAGCATGTGGTCTTTGGCAAGGTGAAAGATGGC 
ATG AGTATTGTGG AAG C CCTGGAACGCT CTGGGTC C AGG AATGGT AAG AC CAGCAAG AAGAT C AC AG CTG 
CTGACTGTGGACAACTCTAATAAATTTGATTGTTT 



In a search of public sequence databases, the NOV41 nucleic acid sequence, located on 
chromsome 1 1 has 442 of 515 bases (85%) identical to a gb:GENBANK- 
ID:HSCYCR|acc:Y00052.1 mRNA from Homo sapiens (Human mRNA for T-cell 
25 cyclophilin). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 
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The disclosed NOV41 polypeptide (SEQ ID NO:94) encoded by SEQ ID NO:93 has 
161 amino acid residues and is presented in Table 4 IB using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV41 has no signal peptide and is 
likely to be localized in the plasma membrane with a certainty of 0.600. 



Table 41B. Encoded NOV41 protein sequence (SEQ ID NO:94). 



MANPTVFFNIAIDSESLGCISFKLFADKVLKMEENFCALNTGEKVFGDKCPCFYRIIPGV 
CQGGDFTHHNGTGGKSLYSKEFDDENFILKHTAPGVLSTANAGPTTNGSQFFFCTAKTED 
GQHWFGKVKDGMSIVEALERSGSRNGKTSKKITAADCGQL 



A search of sequence databases reveals that the NOV41 amino acid sequence has 125 
of 164 amino acid residues (76%) identical to, and 141 of 164 amino acid residues (85%) 
similar to, the 165 amino acid residue ptnr:pir-id:CSHUA protein from human (peptidylprolyl 
10 isomerase (EC 5.2.1 .8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV41 is expressed in at least Heart, Placenta, Stomach, Whole Organism. Expression 
information was derived from the tissue sources of the sequences that were included in the 
derivation of the sequence of CG57159_01.The sequence is predicted to be expressed in the 
15 following tissues because of the expression pattern of (GENBANK-ID: gb:GENBANK- 

ID:HSCYCR|acc:Y00052.1) a closely related Human mRNA for T-cell cyclophilin homolog 
in species Homo sapiens : signet-ring cell carcinoma cell line. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
20 RACE sources. 

The disclosed NOV41 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 41C. 



Table 41C. BLAST results for NOV41 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
i {%) 


Positives 
(%) 


Expect 


qi | 12804335 | qb | AAHO 
3026 . 1 [AAH03026 
(BC003026) 


(protein for 
IMAGE:2823490) 
[Homo sapiens] 


174 


125/164 
(76%) 


141/164 
(85%) 


3e-53 
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gi[4033689|sp|P0437 
4 | CYPH BOVIN 



PEPTIDYL- PROLYL 
CIS-TRANS 
ISOMERASE A 

(PPIASE) 

(ROTAMASE) 

(CYCLOPHILIN A) 

(CYCLOSPORIN A- 
BINDING PROTEIN) 



164 



125/164 
(76%) 



141/164 
(85%) 



9e-53 



gi 1 13 93 7 981 | gb | AAHO 
7104 ,1 [AAH071Q4 
(BC007104) 



pept idylprolyl 
isomer ase A 
(cyclophilin A) 
[Homo sapiens] 



165 



125/164 
(76%) 



141/164 
(85%) 



9e-53 



gi 1 10863927 | ref [NP 
066953 .1] 
(NM 021130) 



pept idylprolyl 
isomerase A 
(cyclophilin A) 
[Homo sapiens] 



165 



125/164 
(76%) 



141/164 
(85%) 



le-52 



gi [ 68401 [pir | | CSBOA 
B 



pept idylprolyl 
isomerase (EC 
5.2.1.8) A - 
bovine 



163 



124/162 
(76%) 



140/162 
(85%) 



4e-52 



Table 4 ID lists the domain descriptions from DOMAIN analysis results against 
NOV41 . This indicates that the NOV41 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 41D. Domain Analysis of NOV41 

qnl 1 Pf am [ pf am0016 0 , pro_isomerase , Cyclophilin type pept idyl -prolyl 
cis-trans isomerase 

CD-Length = 162 residues, 100.0% aligned 

Score = 177 bits (448) , Expect = 5e-46 



The disclosed NOV41 nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
41 A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 41 A while still 
encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
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variant nucleic acids, and their complements, up to about 1 5 percent of the bases may be so 
changed. 

The disclosed NOV41 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 41 B. The invention also includes 
5 a mutant or variant protein any of whose residues may be changed from the corresponding 

residue shown in Table 41B while still encoding a protein that maintains its Cis/Trans Peptidyl 
Prolyl Isomerase-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 24 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

10 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NO V4 1 ) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV41 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 

1 5 pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 

20 those defined here. 

The NOV41 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV41) may be useful in gene therapy, and the Cis/Trans 

25 Peptidyl Prolyl Isomerase-like protein (NOV41) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 
including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 
and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 

30 osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 
insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
or conditions. The NOV41 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
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protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV41 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV41 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV41 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV42 

NOV42 includes two Cis/Trans Peptidyl Prolyl Isomerase-like proteins disclosed 
below. The disclosed sequences have been named NOV42a and NOV42b. 

NOV42a 

A disclosed NOV42a nucleic acid of 720 nucleotides (also referred to as CG57226-01) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase -like protein is shown in Table 42A. The start 
and stop codons are in bold letters. 



Table 42A. NOV42a nucleotide sequence (SEQ ID NO:95). 

CAT CAGGAAAATG CAAAT CAAACCACAACGAGATAT CATGTCACAC CAATT AGGATGGC CACTAT TAAAA 
AC ATAAAATT AAT AAG C ATTGGC AAGG ATG T AG AAATT AG AAC AC CTGTG C ACTGTTGG TGGG AAT ATAA 
AATGATGCAGCTGGCTTTGCAGACACTGCTGTCCCCCAACACCCCCTGTCACTAGGCCATGGTCATCCCG 
ACTGTGCCCTTCAACATCACCATCAACAGCAAGCCCTTAGGACACATCTCCTTTCAGCTATTTGCAGACA 
AATTTCCAAAGACAGGAGAAAACTTTCACACTCTGAACAATAAAGACAAAGGATTTGGTTCCTGCTTTCA 
CAGAAT TATT C CGGAGTT TATATG C CAGGG TGATGAC TTCACACC CCATAATGG CAT TGG TGG CAAG T C C 
ATCTACGGGGATAAATTTG ATGATAAG AAC T T TAT TG TGAAG CATACAGG TC TTGG CAT C TTG T CCATGG 
CAAATGCTGCACCCAAAACAAATGAGTCCCAGTTTTTCATCTGCACTGCCATGGCCAAATGGTGGGATGG 
CAAGCATGTGAT CTTTGG CAGGGTGAAAGAGGGCATGAATATTGTGGAAG CCATGG AATGCTTTGGGT CC 
AGGAATGG CAAGAC AAGC AAG AT CG CC ATTGC C AACTGC AGAC AACT CTG AT AAATTTGACT TGTGTTT T 
AT CTTAACCAC CAG AC C T TT 



In a search of public sequence databases, the NOV42a nucleic acid sequence, located 
on chromsome 1 1 has 338 of 387 bases (87%) identical to a gb.GENBANK- 
ID:AK026569|acc:AK026569.1 mRNA from Homo sapiens (Homo sapiens cDNA: FLJ22916 
fis, clone KAT06406, highly similar to HSCYCR Human mRNA for T-cell cyclophilin). 
Public nucleotide databases include all GenBank databases and the GeneSeq patent database. 
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The disclosed NOV42a polypeptide (SEQ ID NO:96) encoded by SEQ ID NO:95 has 
160 amino acid residues and is presented in Table 42B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV42a has no signal peptide and is 
likely to be localized in the microbody with a certainty of 0.6400. 



Table 42B. Encoded NOV42a protein sequence (SEQ ID NO:96). 



MVIPTVPFNITINSKPLGHISFQLFADKFPKTGENFHTLNNKDKGFGSCFHRIIPEFICQ 
GDDFTPHNGIGGKSI YGDKFDDKNFIVKHTGLGILSMANAAPKTNESQFFICTAMAKWWD 
G KH V I FGR VKEGMN I VE AM E C FGS RNG KT S K I A I ANCRQL 



A search of sequence databases reveals that the NO V42a amino acid sequence has 1 1 8 
of 164 amino acid residues (71%) identical to, and 135 of 164 amino acid residues (82%) 
similar to, the 165 amino acid residue ptnr.pir-id.CSHUA protein from human (peptidylprolyl 

10 isomerase (EC 5.2.1.8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV42a is expressed in at least Brain and Peripheral Blood. Expression information 
was derived from the tissue sources of the sequences that were included in the derivation of 
the sequence of CG57226_01.The sequence is predicted to be expressed in the following 

15 tissues because of the expression pattern of (GENBANK-ID: gb:GENBANK- 

ID:AK026569|acc:AK026569.1) a closely related Homo sapiens cDNA: FLJ22916 fis, clone 
KAT06406, highly similar to HSCYCR Human mRNA for T-cell cyclophilin homolog in 
species Homo sapiens : signet-ring cell carcinoma cell_line. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 

20 but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

NOV42b 

A disclosed NOV42b nucleic acid of 600 nucleotides (also referred to as CG57226-02) 
encoding a Cis/Trans Peptidyl Prolyl Isomerase -like protein is shown in Table 42C. The 
25 start and stop codons are in bold letters. 



Table 42C. NOV42 nucleotide sequence (SEQ ID NO: 97). 



CTGTGCACTGTTGGTGGGAATATAAAATGATGCAGCTGGCTTTGCAGACACTGCTGTCCC 
CCAACACCCCCTGTCACTAGGCC ATGGTCATCCCGACTGTGCCCTTCAACATCACCATCA 
ACAGCAAGCCCTTAGGACACATCTCCTTTCAGCTATTTGCAGACAAATTTCCAAAGACAG 
GAGAAAACTTTCACACTCTGAACAATAAAGACAAAGGATTTGGTTCCTGCTTTCACAGAA 
TTATTCCGGAGTTTATATGCCAGGGTGATGACTTCACACCCCATAATGGCATTGGTGGCA 
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AGTCCATCTACGGGGATAAATTTGATGATAAGAACTTTATTGTGAAGCATACAGGTCTTG 
GCATCTTGTCCATGGCAAATGCTGCACCCAAAACAAATGAGTCCCAGTTTTTCATCTGCA 
CTGCCATGGCCAAATGGTGGGATGGCAAGCATGTGATCTTTGGCAGGGTGAAAGAGGGCA 
TGAATATTGTGGAAGCCATGGAATGCTTTGGGTCCAGGAATGGCAAGACAAGCAAGATCG 
CCATTGCCAACTGCAGACAACTCTGATAAATTTGACTTGTGTTTTATCTTAACCACCAGA 



In a search of public sequence databases, the NOV42b nucleic acid sequence, located 
on chromsome 1 1 has 335 of 382 bases (87%) identical to a gb:GENBANK- 
ID:AK026569|acc:AK026569.1 mRNA from Homo sapiens (Homo sapiens cDNA: FLJ22916 
5 fis, clone KAT06406, highly similar to HSCYCR Human mRNA for T-cell cyclophilin). 

Public nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV42b polypeptide (SEQ ID NO:98) encoded by SEQ ID NO:97 has 
160 amino acid residues and is presented in Table 42D using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV42b has no signal peptide and is 
10 likely to be localized in the microbody with a certainty of 0.6400. 



Table 42D. Encoded NOV42b protein sequence (SEQ ID NO: 98). 

MVIPTVPFNITINSKPLGHISFQLFADKFPKTGENFHTLNNKDKGFGSCFHRIIPEFICQ 
GDDFTPHNGIGGKSIYGDKFDDKNFIVKHTGLGILSMANAAPKTNESQFFICTAMAKWWD 
G KH V I FGR VKEGMN I VE AME C FG S RNG KTS K I A I AN CRQ L 

A search of sequence databases reveals that the NOV42b amino acid sequence has 
1 18 of 164 amino acid residues (71%) identical to, and 135 of 164 amino acid residues (82%) 
15 similar to, the 165 amino acid residue ptnr:pir-id:CSHUA protein from human (peptidylprolyl 
isomerase (EC 5.2.1.8) A). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV42b is expressed in at least Brain, and Peripheral Blood. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
20 including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV42a polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 42E . 



Table 42E. BLAST results for NOV42a 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 12 80433 5 |gb|AAH0 
3026 . 1 |AAH03026 
{BC003026 


protein for 

IMAGE:2823490) [Homo 
sapiens] 


174 


118/164 
(71%) 


135/164 
(81%) 


le-60 
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gill0863927|ref [ NP 
066953 .1| 
(NM 021130 



peptidylprolyl isomerase 
A (cyclophilin A) [Homo 
sapiens] 



165 



118/164 
(71%) 



135/164 
(81%) 



8e-60 



gil4033689lsplP0437 
4|CYPH BOVIN 



PEPTI DYL-PROLYL 
CIS-TRANS 
ISOMERASE A 
(PPIASE) (ROTAMASE) 
(CYCLOPHILIN A) 
(CYCLOSPORIN A- 
BINDING PROTEIN) 



164 



118/164 
(71%) 



135/164 
(81%) 



8e-60 



gi| 13 93 7 981 | gb | AAHO 
7104 . 1 | AAHQ7104 
(BC007104) 



peptidylprolyl isomerase 
A (cyclophilin A) [Homo 
sapiens] 



165 



118/164 
(71%) 



135/164 
(81%) 



9e-60 



gi | 14743515 | ref | XP 
017224 .2 1 
(XM 017224) 



hypothetical protein 
XP_017224 [Homo 
sapiens] 



165 



118/164 
(71%) 



135/164 
(81%) 



3e-59 



Table 42F lists the domain descriptions from DOMAIN analysis results against NOV 
42 . This indicates that the NOV42 sequence has properties similar to those of other proteins 
known to contain this domain. 



Table 42F Domain Analysis of NOV42 

gnl | Pf am | pf amQ0160 , pro_isomerase, Cyclophilin type pept idyl -prolyl 
cis-trans isomerase. 

CD-Length = 162 residues, 100.0% aligned 
Score = 185 bits (469), Expect = 2e-48 



The disclosed NOV42 nucleic acid of the invention encoding a Cis/Trans Peptidyl 
Prolyl Isomerase-like protein includes the nucleic acid whose sequence is provided in Table 
42A or 42C or a fragment thereof. The invention also includes a mutant or variant nucleic 
acid any of whose bases may be changed from the corresponding base shown in Table 42A or 
42C while still encoding a protein that maintains its Cis/Trans Peptidyl Prolyl Isomerase-like 
activities and physiological functions, or a fragment of such a nucleic acid. The invention 
further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
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In the mutant or variant nucleic acids, and their complements, up to about 13 percent of the 
bases may be so changed. 

The disclosed NOV42 protein of the invention includes the Cis/Trans Peptidyl Prolyl 
Isomerase-like protein whose sequence is provided in Table 42B or 42D. The invention also 
5 includes a mutant or variant protein any of whose residues may be changed from the 
corresponding residue shown in Table 42B or 42D while still encoding a protein that 
maintains its Cis/Trans Peptidyl Prolyl Isomerase-like activities and physiological functions, 
or a functional fragment thereof. In the mutant or variant protein, up to about 29 percent of 
the residues may be so changed. 

1 0 The invention further encompasses antibodies and antibody fragments, such as F ab or 

(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Cis/Trans Peptidyl 
Prolyl Isomerase-like protein (NOV42 ) may function as a member of a "Cis/Trans Peptidyl 
Prolyl Isomerase family". Therefore, the NOV42 nucleic acids and proteins identified here 

1 5 may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 

20 regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV42 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Cis/Trans Peptidyl 

25 Prolyl Isomerase-like protein (NOV42) may be useful in gene therapy, and the Cis/Trans 

Peptidyl Prolyl Isomerase-like protein (NOV42) may be useful when administered to a subject 
in need thereof. By way of nonlimiting example, the compositions of the present invention 
will have efficacy for treatment of patients suffering from CNS disorders, brain disorders 
including epilepsy, eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation 

30 and autoimmune disorders including Crohn's disease, IBD, allergies, rheumatoid and 

osteoarthritis, inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia 
AIDS; thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases 
such as asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic 
insufficiency and cancer; and prostate disorders including prostate cancer, or other pathologies 
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or conditions. The NOV42 nucleic acid encoding the Cis/Trans Peptidyl Prolyl Isomerase-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV42 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV42 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV42 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV43 

A disclosed NOV43 nucleic acid of 3146 nucleotides (also referred to as CG57538-01) 
encoding a ceruloplasmin-like protein is shown in Table 43 A. The start and stop codons are in 
bold letters. 



Table 43A. NOV43 nucleotide sequence (SEQ ID NO:99). 

AATTTGAAAATGAAGGCACTTTTACCATTGACCTTTCTGTTTTTTATTAGTTCTCCAGGTTGGGCAATAG 
ATAGGC AC TG C TACATAGG CATTGAAGAAAG C ATT TGGAAC TATG C TAATG C TG ATG AAAACTT T C T CAT 
GATTGACACTTGCAGGACACATATG C GATTATTTCTACAAGGAGGTCAAGCGAGGAAGAG CTTTGTTTTT 
AAAAAGGCTTTGTATTTTCAATATACTGATAATACATTTCAAAGGATCATTGAAAAACCATCCTGGTTGG 
GATTTTTAGGTCCAATGATTAAAGCAGAGACTGGAGACTTCATTTATGTACATGTAAAAAATAATGCTTC 
AAGAGCTTATAGTTATCATCCTCATGGGCTCACCTACTCCAAAGAAAATGAAGGTGCTATCTATCCTGAT 
AATACG ACAGG C CTG CAAAAGGAAG ATGAAT AT CTGGAGC CAGGGAAACAATATAC CTACAAGTGGTATG 
TAGAAGAACATCAGGGACCTGGCCCCAATGACAGTAATTGTGTGACAAGAATTTACCATTCCCATATAGA 
CACTGCAAGAGATGTAGCTTCGGGACTTATTGGACCAATACTGACTTGTAAAAGAGGTACACTGAATGGA 
GACACTGAAAAAGATATTGACAGGTCTTCTTTTCTGATGTTTTCTACAACTGATGAAAGCAGAAGCTGGT 
AT AGTGATGAAAAT AT T CGTG CATTT ACT GAAT CTGGC AAGAT T AAT ACT AGTGAT CC C CGTTTTGAGGA 
GAG CATGAG CATGCAAG CAATAAATGG ATACAT CTATGGAAAT CTG C C CAAT CT CAC CATG TGTG C TGAA 
GATAGGGTCCAGTGGTATTTTGTTGGCATGGGTGGCGTGGCTGACATACACCCCGTCTACCTCCGCGGAC 
AAACTCTGATCTCTCGGAATCACAGAAAGGACACCATTATGCTCTTCCCCTCCTCACTGGAAGATGCCTT 
CATGGTGG C CAAGG C C C C TGG AG TGTGG ATGCTGGG ATG C CAGATG C AGGC ATT T TTC AAAGTAAGT AAT 
TGCCAGAAACCTTCAACAGAAGCCTTTGTTACTGGGACACATGTTATACATTACTATATTGCTGCTAAAG 
AAATTCTTTGGAACTATGCTCCATCTGGTATAGATTTCTTCACTAAAAAAAATTTAACAGCAGCTGGAAG 
TAAATCCCAGTTATTTTTTGAACGAAGTCCAACCAGAATTGGAGGAACTAACAAAAAACTGATTTACCGT 
GAATACACAGATGCTTCCTTCCAAACACAGAAGGCAAGAGAAGAACACCTTGGAATCCTAGGCCCCGTTA 
TTAAGGCAGAGGTGAGACAGACCATCAAAATCACTTTCTATAACAATGCTTCCCTGCCACTCAGCATTCA 
GCCTCCTGGACTGCATTACAACAAGAGCTTGTGGCAGAGTTATTACTTTAGTTCCTATTCAACTGTCACC 
CAAAGAGAAAGATCTGTTCCTCCACCCTCTTCACATGTAAGTCCTGGCACAACATTTGTCTATACATGGG 
AAGTTCCAAAAGATGTGGGTCCCACCTCCACAGATCCCAACTGCTTGACCTGGTTCTATTACTCTTCAGT 
AAATGGGAAAAAAGACATCAACAGTGGCCTTCTGGGGCCTCTCCTTATATGTAGAAATGGAAGTCTTGGA 
GACGATGGCAAACAGAAAGG AG TAGACAAAG AGTT TTACCT AC TTG C CACAAT ATT TG ATGAAAATGAAA 
GT AAT C T CTT GGATGAAAAT AT C AG AACATTT AT CAC AGAG C CTGAAAAC AT AG AT AAAGAGGAT ACAGA 
CTG C CAAG C C T CAAATAAGATG TACG C CATAAATGGATACATG TATGGAAATCTG C CTGGATTGGACACG 
TGCTTAGGAGACAACGTTTTGTGGCACGTTTTTAGTGTAGGATCAGTGGAAGATTTACACGGGATATATT 
TTTCAGGAAATACCTTCACTTCTTTAGGAGCAAGAAGGGACACAATACCTATGTTTCCTTATACTTCTCA 
GACGCTTTTGATGACACCTGATTCTATAGGTACTTTTGATTTGGTTTGCATGACAATAAAGCACAATCTA 
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GGAGGCATGAAACATAAATATCACGTGAGGCAATGTGGGAAGCCAAACCCTGATCAAACACAATACCAGG 

AGGAGAAAATAATTATTACCATTGCAGCCGAGGAAATGGAATGGGATTATTCTCCTAGTAGAAAGTGGGA 

GAATGAACT C C ACCAC TT ACG AAGAGAG AGC CAAACGAGCATGT ATGTGGACAGAAGTGGAACACTT C TT 

GGGT C CAAATACAAGAAAG T C TT ATATCG T CAATATGATGATAACACG T C ACAAAT CAAACAAAAAGGAA 

TGAGGG TGAAAAACATC T CGATAC TAGG T C CATTAAT ATTG CT CAAC C CTGG TCAAATAAT T CAAATTAT 

CTTTAAAAATAAAGCCGCAAGACCGTATTCTATTCATGCTCATGGAGTGAAAACAAATAATTCCACTGTT 

GTT C CAAC T CAG C CAGGTGAGATT CAAATATATAC TTGG CAGATAC CTGATAGAAC TGGT CCT AC C T CAC 

TGGACTTTGAATGCATACCTTGGTTTTACTATTCAACTGTATCTGTGGCTAAGGACCTTCACAGTGGACT 

GGTAGGCCCTCTCTCTGTATGCCGCAAAGACATCAACCCCAACATAGTTCACCGTGTTCTCCACTTCATG 

ATATTTGATGAGAATGAAT C C TGG T AC T T CGAAGACAGTAT CAACAC C TATG CT T CAAAAC CAAACAAAG 

TGGACAAGGAAAATGATAATTTT CAAC T CAG CAAC CAAATGCACGCAATTAACGGAAGACT 

T AAC C AAGGT AT AAC AT T CC ATGTTGGGGATGT AGTGAAT TGGT AT C TGATTGG C AT AGGGAATGAAGCT 

GGAGTGTATCAATCTGATGTTTATGACCTTCCTCCTGGGGTCTATCGAACTGTAAAAATGTATCGAAGAG 

ATGTTGGAACCTGGTTATTTTATTGCCATGTTTTTGAGCACATTGGTGCTGGAATGGAAAGCACTTACAC 

TGTACTTGAAAGAAAAGGTAAGATCCATTGGCTAAATTAATTAGAAGTGATATTTAAACAAATGCA 



In a search of public sequence databases, the NOV43 nucleic acid sequence, located on 
chromsome 3 has 1 1 13 of 1697 bases (65%) identical to a gbrGENBANK- 
ID:HUMCERP|acc:Ml 3699.1 mRNA from Homo sapiens (Human ceruloplasmin 
5 (ceruloplasmin) mRNA, complete cds). Public nucleotide databases include all GenBank 
databases and the GeneSeq patent database. 

The disclosed NOV43 polypeptide (SEQ ID NO: 100) encoded by SEQ ID NO:99 has 
1036 amino acid residues and is presented in Table 43B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV43 has a signal peptide and is 
10 likely to be localized extracellularly with a certainty of 0.4085. The most likely cleavage site 
for a NOV43 peptide is between amino acids 19 and 20. 



Table 43B. Encoded NOV43 protein sequence (SEQ ID NO:100). 

MKALLPLTFLFFISSPGWAIDRHCYIGIEESIWNYANADENFLMIDTCRTHMPLFLQGGQ 
ARKS FVFKKALYFQYTDNTFQR 1 1 EKPS WLGFLGPMI KAETGDFI YVHVKNNASRAYS YH 
PHGLTYSKENEGAIYPDNTTGLQKEDEYLEPGKQYTYKWYVEEHQGPGPNDSNCVTRIYH 
SHIDTARDVASGLIGPILTCKRGTLNGDTEKDIDRSSFLMFSTTDESRSWYSDENIRAFT 
ESGKINTSDPRFEESMSMQAINGYIYGNLPNLTMCAEDRVQWYFVGMGGVADIHPVYLRG 
QTLISRNHRKDTIMLFPSSLEDAFIWAKAPGVV^LGCQMQAFFKVSNCQKPSTEAFVTGT 
HVIHYYIAAKEILWNYAPSGIDFFTKKNLTAAGSKSQLFFERSPTRIGGTNKKLIYREYT 
DAS FQTQKAREEHLG I LG P V I KAE VRQT I K I T F YNNAS LPL S I Q P PGLH YNKS L WQS YYF 
SSYSTVTQRERSVPPPSSHVSPGTTFVYTWEVPKDVGPTSTDPNCLTWFYYSSVNGKKDI 
N S GLLG P LL I CRNG S LGDDGKQKGVDKE F YLLAT I FDENE S NLL.DEN I RT F I TE P EN I DK 
EDTDCQASNKIWAINGYMYGNLPGLDTCLGD3WLWHVFSVGSVEDLHGIYFSGNTFTSLG 
ARRDT I PMFP YTSQTLLMTPDS I GTFDL VCMT I KHNLGGMKHK YHVRQ CGKPNPDQTQ YQ 
EEK 1 1 1 T I AAEEMEWDYS PSRKWENELHHLRRE S QTSMYVDRS GTLLGS KYKKVLYRQYD 
DNT SQ I KQKGMRVKNI S I LGPL I LLNPGQ I IQ 1 1 FKNKAARP YS I HAHGVKTNNSTVVPT 
QPGEIQIYTWQIPDRTGPTSLDFECIPWFYYSTVSVAKDLHSGLVGPLSVCRKDINPNIV 
HRVLHFMIFDENESWYFEDSINTYASKPNKVDKENDNFQLSNQMHAINGRLFGNNQGITF 
HVGDVVNWYLIGIGNEAGWQSDVYDLPPGVYRTVKKYRRDVGTWLFYCHVFEHIGAGME 
STYTVLERKGKIHWLN 



A search of sequence databases reveals that the NOV43 amino acid sequence has 548 
of 994 amino acid residues (55%) identical to, and 719 of 994 amino acid residues (72%) 
15 similar to, the 1069 amino acid residue ptnr:pir-id:KUHU protein from human (ceruloplasmin 
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(EC 1.16.3.1) precursor [validated]). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV43 is expressed in at least salivary glands. Expression information was derived 
from the tissue sources of the sequences that were included in the derivation of the sequence 
5 of CG57538-01 .The sequence is predicted to be expressed in the following tissues because of 
the expression pattern of (GENBANK-ID: gb:GENBANK-ID:HUMCERP|acc:Ml 3699.1) a 
closely related Human ceruloplasmin (ceruloplasmin) mRNA, complete cds homolog in 
species Homo sapiens rliver, secreted into plasma.. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
10 but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV43 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 43C. 



Table 43C. BLAST results for NOV43 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 1070458 |pir| | KUH 
U 


ferroxidase (EC 
1.16.3.1) 

precursor - human 


1069 


579/1068 
(54%) 


760/1068 
(70%) 


0 . 0 


gi|4557485|ref |NP 0 

00087.1) 

(NM_000096) 


ceruloplasmin 
(ferroxidase) ; 
Ceruloplasmin 
[Homo sapiens] 


1065 


578/1066 
(54%) 


758/1066 
(70%) 


0 . 0 


gi | 1942284 | pdb | 1KCW 
1 


X-Ray Crystal 
Structure Of 
Human 

Ceruloplasmin At 
3 . 0 Angstroms 


1048 


568/1046 
(54%) 


746/1046 
(71%) 


0.0 


gi | 52 81319 | gb | AAD4 1 
477 . 1 |AF134814 1 
(AF134814) 


(AF134814) 
ceruloplasmin 
[Ovis aries 


1048 


577/1054 
(54%) 


742/1054 
(69%) 


0 . 0 


gi | 6680997 | ref |NP 0 


gi | 6680997 | ref |NP 


1062 


560/1062 
(52%) 


737/1062 
(68%) 


0 . 0 


31778 . 1 | 
(NM 007752) 


031778. 1 | 
(NM 007752 



15 

Table 43D lists the domain descriptions from DOMAIN analysis results against 
NOV43. This indicates that the NOV43 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 43D. Domain Analysis of NOV43 



gnl 1 Pf am|pfam003 94 , Cu-oxidase, Multicopper oxidase. Many of the 
proteins in this family contain multiple similar copies of this 
plastocyanin-like domain. 



CD-Length = 135 residues, 90.4% aligned 



Score = 37,7 bits (86), Expect = 0.003 



In Wilson disease, the basal ganglia and liver undergo changes that express themselves 
in neurologic manifestations and signs of cirrhosis, respectively. A disturbance in copper 
metabolism is somehow involved in the mechanism. Low ceruloplasmin is found in the serum. 
Shokeir and Shreffler (1969) advanced the hypothesis that ceruloplasmin functions in 
enzymatic transfer of copper to copper-containing enzymes such as cytochrome oxidase. 
Supporting the hypothesis was the finding of markedly reduced levels of activity of 
cytochrome oxidase in Wilson disease and moderate reductions in heterozygotes. An 
abnormality of ceruloplasmin seems to be involved in Wilson disease. The fact that 
individuals with hereditary ceruloplasmin deficiency have profound iron accumulation in most 
tissues suggests that ceruloplasmin is important for normal release of cellular iron 
(Mukhopadhyay et aL, 1998). At least 3 variants determined by codominant alleles have been 
identified by starch gel electrophoresis (Shreffler et aL, 1967). Human ceruloplasmin is 
composed of a single polypeptide chain (Takahashi et al., 1984). 

The disclosed NOV43 nucleic acid of the invention encoding a ceruloplasmin- like 
protein includes the nucleic acid whose sequence is provided in Table 43 A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 43 A while still encoding a protein 
that maintains its ceruloplasmin-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 35 percent of the bases may be so changed. 
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The disclosed NOV43 protein of the invention includes the ceruloplasmin-like protein 
whose sequence is provided in Table 43B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
43B while still encoding a protein that maintains its ceruloplasmin-like activities and 
5 physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 45 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this ceruloplasmin-like 
10 protein (NOV43) may function as a member of a "ceruloplasmin family". Therefore, the 
NOV43 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV43 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
20 and disorders as indicated below. For example, a cDNA encoding the ceruloplasmin-like 

protein (NOV43) may be useful in gene therapy, and the ceruloplasmin-like protein (NOV43) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 
example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from Wilson disease, dementia, diabetes, retinal degeneration, neurologic 
25 degeneration, xerostomia, or other pathologies or conditions. The NOV43 nucleic acid 

encoding the ceruloplasmin-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV43 nucleic acids and polypeptides are further useful in the generation of 
30 antibodies that bind immuno-specifically to the novel NOV43 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV43 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
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assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV44 

NOV44 includes two Leucine rich repeat-like proteins disclosed below. The disclosed 
sequences have been named NOV44a and NOV44b. 



NOV44a 

A disclosed NOV44a nucleic acid of 857 nucleotides (also referred to as CG57623-01) 
encoding a Leucine rich repeat -like protein is shown in Table 44A. The start and stop codons 
are in bold letters. 



Table 44A. NOV44a nucleotide sequence (SEQ ID NO:101). 

AACAATAAAC TT C TAACACT C C C TTAC TAAAAG AACATGG T TAAGGG TGAGAAAGG C C C CAAGGG CAAGA 
AGATCACCCTCAAGGTGGCCAGGAATTGCATCAAAATCACTTTTGATGGGAAAAAGCGCCTTGACTTGAG 
CAAGATGGGAATTACCACCTTCCCCAAGTGTATTCTGCGCCTTAGTGACATGGACGAGCTGGACCTTAGC 
CGGAATCTTATCAGGAAGATCCCTGACTCCATCTCCAAGTTCCAGAACCTCCGGTGGCTGGACCTGCACA 
GCAACTACATAGACAAGCTGCCTGAGTCCATTGGCCAGATGACCAGCCTGCTCTACCTCAACGTCAGCAA 
CAAC CGG C TGAC CAG CAACGGG CTG C CCG TGG AG CTGAAG CAACT CAAGAACAT C CG CG C TG TG AACC TA 
GGCTTGAACCACCTGGACAGCGTGCCCACCACACTGGGGGCCCTGAAGGAGCTCCACGAGGTAGGGCTCC 
ATGACAACCTACTGAACAACATCCCCGTGAGCATCTCCAAGCTCCCCAAGCTGAAAAAGCTCAACATAAA 
GCGGAACCCCTTTCCAAAGCCAGGTGAGTCGGAAATATTCATAGACTCCATCAGGAGGCTGGAGAACTTG 
TATGTTGTGGAGGAGAAGGATCTGTGTGCGGCTTGCCTGAGAAAATGCCAAAACGCCCGGGACAACCTGA 
ATAGAAT CAAGAACATGG C CACGACGACACCGAG AAAG AC CATCT TT C CCAAT CTG AT CT CAC C CAAT T C 
CATGG CC AAGGACT C CTGGGAAGACT GGAGGT G ACT T GGAAC CTGAC C CT G AGG CAGAAGGGAAAGAGAG 
AGGG AGGG AAG AGGG CA 



In a search of public sequence databases, the NOV44a nucleic acid sequence, located 
on chromsome 10 has 188 of 313 bases (60%) identical to a gb.GENBANK- 
ID:AB016816|acc:AB016816.1 mRNA from Homo sapiens (Homo sapiens MASL1 mRNA, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV44a polypeptide (SEQ ID NO: 102) encoded by SEQ ID NO: 101 
has 255 amino acid residues and is presented in Table 44B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV44a has no signal peptide 
and is likely to be localized cytoplasmically with a certainty of 0.4500. 



Table 44B. Encoded NOV44a protein sequence (SEQ ID NO:102). 

MVKGEKVPKGKKITLKVARNCIKITFDGKKRLDLSKMGITTFPKCILRLSDMDELDLSRN 
LIRKIPDSIS KFQNLRWLDLHSN Y IDKLPESI GQMTS LL YLNVSNNRLTSNGLP VE LKQL 
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KNIRAVNLGLNHLDSVPTTLGALKELHEVGLHDNLLNNIPVSISKLPKLKKLNIKRNPFP 
KPGESEIFIDSIRRLENLYWEEKDLCAACLRKCQNARDNLNRIKNiyiATTTPRKTIFPNL 
ISPNSMAKDSWEDWR 



A search of sequence databases reveals that the NOV44a amino acid sequence has 21 1 
of 255 amino acid residues (82%) identical to, and 235 of 255 amino acid residues (92%) 
similar to, the 262 amino acid residue ptnr:TREMBLNEW-ACC:BAB29635 protein from 
5 Mus musculus (Mouse) (ADULT MALE TESTIS CDNA, RIKEN FULL-LENGTH 

ENRICHED LIBRARY, CLONE:4921523N16, FULL INSERT SEQUENCE). Public amino 
acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV44a is expressed in at least cervix, brain, and testis. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
10 including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 



NOV44b 

A disclosed NOV44b nucleic acid of 847 nucleotides (also referred to as CG57623- 
01) encoding a Leucine rich repeat -like protein is shown in Table 44C. The start and stop 
1 5 codons are in bold letters. 



Table 44C. NOV44b nucleotide sequence (SEQ ID NO: 103). 

CTTCTAACACTCCCTT ACTAAAAGAACATGGTTAAGGGTGAGAAAGGCCCCAAGGGCAAG 
AAGATCACCCTCAAGGTGGCCAGGAATTGCATCAAAATCACTTTTGATGGGAAAAAGCGC 
CTTGACTTGAGCAAGATGGGAATTACCACCTTCCCCAAGTGTATTCTGCGCCTTAGTGAC 
ATGGACGAGCTGGACCTTAGCCGGAATCTTATCAGGAAGATCCCTGACTCCATCTCCAAG 
TTCCAGAACCTCCGGTGGCTGGACCTGCACAGCAACTACATAGACAAGCTGCCTGAGTCC 
ATTGGCCAGATGACCAGCCTGCTCTACCTCAACGTCAGCAACAACCGGCTGACCAGCAAC 
GGGCTGCCCGTGGAGCTGAAGCAACTCAAGAACATCCGCGCTGTGAACCTAGGCTTGAAC 
CACCTGGACAGCGTGCCCACCATGCTGGGGGCCCTGAAGGAGCTCCACGAGGTAGGGCTC 
CATGACAACCTACTGAACAACATCCCCGTGAGCATCTCCAAGCTCCCCAAGCTGAAAAAG 
CTCAACATAAAGCGGAACCCCTTTCCAAAGCCAGGTGAGTCGGAAATATTCATAGACTCC 
ATCAGGAGGCTGGAGAACTTGTATGTTGTGGAGGAGAAGGATCTGTGTGCGGCTTGCCTG 
AGAAAATGC C AAAACG C C CGGGACAAC C T GAAT AGAAT CAAGAACATGG C CACGACG ACA 
CCGAGAAAGACCATCTTTCCCAATCTGATCTCACCCAATTCCATGGCCAAGGACTCCTGG 
GAAGACTGGAGGTG ACTTGGAACCTGAGCCCTGAGGCAGAAGGGAAAGAGAGAGGGAGGG 
AAGAGGG 



In a search of public sequence databases, the NOV nucleic acid sequence, located on 
chromsome 10 has 187 of 313 bases (59%) identical to a gb:GENBANK- 
20 ID:AB016816|acc:AB016816.1 mRNA from Homo sapiens (Homo sapiens MASL1 mRNA, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 
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The disclosed NOV44b polypeptide (SEQ ID NO: 104) encoded by SEQ ID NO: 103 
has 255 amino acid residues and is presented in Table B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV44b as no signal peptide and is 
likely to be localized cytoplasmically with a certainty of 0.4500. 



Table 44D. Encoded NOV44b protein sequence (SEQ ID NO: 104). 



MVKGEKGPKGKKITLKVARNCIKITFDGKKRLDLSKMGITTFPKCILRLSDMDELDLSRN 
LIRKIPDSISKFQNLRWLDLHSNYIDKLPESIGQMTSLLYLNVSNNRLTSNGLPVELKQL 
KNIRAVNLGLNHLDSVPTMLGALKELHEVGLHDNLLNNIPVSISKLPKLKKLNIKRNPFP 
KPGESE I FIDS I RRLENL YWEEKDLCAACLRKCQNARDNLNRI KNMATTTPRKT I FPNL 
ISPNSMAKDSWEDWR 



A search of sequence databases reveals that the NOV44b amino acid sequence has 21 1 
of 255 amino acid residues (82%) identical to, and 235 of 255 amino acid residues (92%) 
similar to, the 262 amino acid residue ptnr:SPTREMBL-ACC:Q9CQ07 protein from Mus 
musculus (Mouse) (4930442L21RIK PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV44b is expressed in at least cervix, brain and testis. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV44 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 44E. 



Table 44E. BLAST results for NOV44 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l3385762|ref | NP 
080529 . 1 | 
(NM_026253) 


RIKEN cDNA 
4930442L21 [Mus 
musculus] 


262 


211/255 
(82%) 


235/255 
(91%) 


e-108 


gi | 12838968 | db j | BAB 
24391. l| (AK006063) 


Leucine Rich 
Repeat containing 
protein-data 
source : Pf am, 
source 

key:PF00560, 
evidence : ISS-puta 
tive [Mus 
musculus) 


255 


210/255 
(82%) 


234/255 
(91%) 


e-108 
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gi | 12838360 | db j | BAB 


Leucine Rich 
Repeat containing 
protein-data 
source : Pf am, 
source 
key:PF00560, 
evidence : ISS-puta 
tive [Mus 
musculus] 


230 


195/228 
(85%) 


214/228 
(93%) 


e-97 


24176 . 1 | (AK005666) 


gi | 1013 0019 |gb| AAG1 


PIDD [Homo 
sapiens] 


910 


48/146 
(32%) 


87/146 
(58%) , 


2e-18 


3461.1 [AF274972 1 


(AF274972) 


gi | 12083587 | ref | NP 


p53 protein 
induced, with 
death domain [Mus 
musculus] 


915 


47/146 
(32%) 


87/146 
(59%) 


2e-18 


073145 .1 | 
(NM_022654) 



Leucine-rich repeats (LRRs) are relatively short motifs (22-28 residues in length) 
found in a variety of cytoplasmic, membrane and extracellular proteins. Although these 
proteins are associated with widely different functions, a common property involves protein- 
protein interaction. Little is known about the 3D structure of LRRs, although it is believed that 
they can form amphipathic structures with hydrophobic surfaces capable of interacting with 
membranes. In vitro studies of a synthetic LRR from Drosophila Toll protein have indicated 
that the peptides form gels by adopting beta-sheet structures that form extended filaments. 
These results are consistent with the idea that LRRs mediate protein-protein interactions and 
cellular adhesion. Other functions of LRR-containing proteins include, for example, binding to 
enzymes and vascular repair. The 3-D structure of ribonuclease inhibitor, a protein containing 
1 5 LRRs, has been determined, revealing LRRs to be a new class of alpha/beta fold. LRRs 
form elongated non-globular structures and are often flanked by cysteine rich domains. 

The disclosed NOV44 nucleic acid of the invention encoding a leucine-rich repeat-like 
protein includes the nucleic acid whose sequence is provided in Table 44A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 44A while still encoding a protein 
that maintains its leucine-rich repeat-like activities and physiological functions, or a fragment 
of such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
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binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 41 percent of the bases may be so changed. 

The disclosed NOV44 protein of the invention includes the leucine-rich repeat-like 
protein whose sequence is provided in Table 44B. The invention also includes a mutant or 
5 variant protein any of whose residues may be changed from the corresponding residue shown 
in Table B while still encoding a protein that maintains its leucine-rich repeat-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 18 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

10 (Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this leucine-rich repeat- 
like protein (NOV44) may function as a member of a "leucine-rich repeat family". Therefore, 
the NOV 44 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 

15 below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

20 The NOV44 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the leucine-rich repeat-like 
protein (NOV44) may be useful in gene therapy, and the leucine-rich repeat-like protein 
(NOV44) may be useful when administered to a subject in need thereof. By way of 

25 nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from fertility, Von Hippel-Lindau (VHL) syndrome, 
Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 

30 neuroprotection, or other pathologies or conditions. The NOV44 nucleic acid encoding the 

leucine-rich repeat-like protein of the invention, or fragments thereof, may further be useful in 
diagnostic applications, wherein the presence or amount of the nucleic acid or the protein are 
to be assessed. 
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NOV44 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV44 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV44 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV45 

NOV45 includes two Ig/fibronectin-like proteins disclosed below. The disclosed 
sequences have been named NOV45a and NOV45b. 

NOV45a 

A disclosed NOV45a nucleic acid of 4321 nucleotides (also referred to as CG57656- 
01) encoding a Ig/fibronectin -like protein is shown in Table 45A. The start and stop codons 
are in bold letters. 



Table 45A. NOV45a nucleotide sequence (SEQ ID NO: 105). 

CTGGTGGGGGGCGGGGTGACCCTGTGACACGGACATGGGGCTGCTGGGGCAGGATCTCTTTGTCACCTCC 

TTTCTGTGTCCAACCTGGCCGTCCCCATCAGGCGCCCACGGCCTGCGAGAGGAGCCCGAGTTTGTGACGG 

CAAGAGCTGGGGAGAGCGTGGTCCTGCGATGCGACGTGATCCACCCAGTGACGGGACAGCCCCCACCCTA 

TGTCGTAGAGTGGTTCAAGTTCGGGGTCCCCATCCCTATCTTCATCAAGTTTGGCTACTACCCGCCGCAC 

GTGGACCCTGAGTATGCAGGTAAGGTCGGCGCCCACGGCCTGCGAGAGGAGCCCGAGTTTGTGACGGCAA 

GAGCTGGGGAGAGCGTGGTCCTGCGATGCGACGTGATCCACCCAGTGACGGGACAGCCCCCACCCTATGT 

CGTAGAGTGGTTCAAGTTCGGGGTCCCCATCCCTATCTTCATCAAGTTTGGCTACTACCCGCCGCACGTG 

GACCCTGAGTATGCAGGTAAGGTCAGTCTTCATGATAAGGCATCTCTGCGGCTGGAACAAGTTCGCTCTG 

AGGACCAGGGCTGGTATGAGTGCAAAGTGCTCATGCTGGACCAGCAGTATGACACCTTCCACAATGGCAG 

CTGGGTCCACCTCACCATCAACGCCCCTCCCACCTTTACAGAAACACCCCCCCAGTACATCGAGGCCAAG 

GAGGGTGGTAGTATCACCATGACCTGCACAGCTTTTGGGAACCCCAAGCCCATTGTCACCTGGCTCAAGG 

AGGGGACGCTCCTCGGTGCTAGTGGGAAATACCAGGTGAGTGTGGTTCTAGGTAGCCTGACAGTGACATC 

GGTCAGTCGGGAGGACAGAGGTGCCTACACCTGCCGAGCGTACAGCATTCAGGGGGAGGCTGTCCACACG 

ACTCACCTGCTTGTCCAAGGGCCCCCTTTCATCGTCTCCCCTCCTGAGAACATCACCGTCAACATCTCCC 

AGGATGCTCTGCTCACCTGCCGGGCAGAGGCGTATCCGGGCAACCTCACCTACACCTGGTACTGGCAGGA 

CGAGAACGTCTACTTTCAGAACGACCTGAAGCTGAGGGTGCGCATCCTAATCGATGGGACCCTGATCATC 

TTCCGGGTGAAGCCGGAGGACTCAGGGAAGTACACCTGTGTGCCCAGCAACAGCCTGGGGCGCTCCCCCT 

CCGCCTCGGCGTACCTGACCGTGCAGTACCCAGCGCGTGTCCTCAACATGCCCCCTGTGATTTACGTGCC 

CGTGGGGATCCATGGCTACATCCGCTGCCCTGTGGACGCAGAACCACCGGCCACCGTGGTCAAGTGGAAC 

AAGGACGGCCGTCCCCTGCAGGTTGAGAAGAACCTCGGTTGGACCCTGATGGAGGATGGCTCCATTCGAA 

TTGAGGAGGCCACAGAGGAGGCTCTTGGCACTTATACCTGTGTGCCTTACAACACTCTGGGGACCATGGG. 

CCAGTCTGCCCCTGCGAGGCTTGTCCTGAAGGACCCCCCCTATTTCACGGTGCTACCAGGCTGGGAGTAC 

AGGCAGGAGGCCGGCCGGGAGCTACTTATCCCCTGTGCTGCCGCAGGGGACCCCTTTCCTGTCATCACTT 

GGAGAAAGGTAGGGAAGCCCAGCAGAAGCAAGCACAGTGCCCTGCCCAGTGGGAGCCTGCAGTTCCGTGC 

CCTGAGTAAGGAGGACCACGGGGAGTGGGAATGTGTCGCCACCAACGTGGTCACGAGCATCACTGCCAGC 

ACCCACCTCACCGTCATCGGTACGGGCACCAGCCCCCATGCCCCGGGCAGTGTCCGGGTCCAGGTCTCCA 

TGACAACTGCCAACGTGTCCTGGGAACCAGGCTATGATGGAGGCTACGAGCAGACATTCTCAGTTTGGTA 

CGGACCTCTGATGAAGCGGGCACAGTTTGGGCCCCATGACTGGCTGTCCTTGCCAGTGCCGCCAGGACCC 

AGCTGGCTGCTGGTGGACACCCTGGAGCCTGAGACAGCGTACCAGTTCAGCGTCCTGGCCCAGAAGCTGG 
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GAACCAGCGCCTTCAGTGAGGTGGTCACTGTGAACACTTTAGCATTCCCTATTACAACTCCAGAACCCCT 
GGTGCTGGTCACCCCACCGAGGTGCCTCATAGCCAATCGGACTCAGCAGGGTGTGCTCCTGTCCTGGCTT 
CCGCCTGCCAACCACAGCTTTCCCATCGACCGCTACATCATGGAGTTCCGTGTCGCAGAGCGCTGGGAGT 
TGCTCGACGATGGCATCCCCGGCACCGAAGGAGAGTTCTTTGCCAAGGATCTGTCACAGGACACGTGGTA 
TGAGTTCCGGGTTCTGGCCGTCATGCAGGATCTGATCGGCGAGCCCAGCAACATCGCCGGCGTCTCCAGC 
ACAGACATCTTCCCGCAGCCGGACCTGACCGAGGATGGGCTGGCGCGGCCTGTGCTGGCGGGAATCGTAG 
CTACCATCTGCTTCTTGGCAGCTGCCATCCTGTTCAGCACCCTGGCTGCCTGCTTTGTCAACAAGCAGCG 
CAAGCGTAAGCTCAAGCGCAAAAAAGACCCTCCACTCTCCATCACCCACTGCAGGAAGAGCCTGGAGTCT 
CCCTTGTCCTCTGGCAAGGTGAGCCCCGAGAGCATCCGCACGCTCCGAGCGCCGTCAGAATCCTCCGACG 
AC C AGGG C CAGC C CG CGGC C AAGAGGATGCTG AG C CCCAC C CGTGAG AAGGAG C TGT CGCTGTAC AAGAA 
GAC C AAG CGGG C CAT CAGC AG CAAGAAGT AC AG CGTGGC C AAGG CAG AGG C CGAGG C AGAGG C CAC CACG 
CCCATCGAGCTCATCAGCAGAGGCCCTGACGGCCGCTTCGTGATGGACCCTGTCGAGATGGAGCCCTCGC 
TGAAGAGCAGGCGCATCGAGGGCTTCCCCTTCGCCGAGGAGACGGACATGTACCCCGAGTTCCGCCAGTC 
GGACGAGGAGAACGAGGACCCACTGGTGCCCACATCTGTGGCCGCCCTGAAGTCCCAGCTCACCCCTCTG 
TCATCCAGCCAGGAGTCCTACCTGCCACCACCAGCATACAGCCCTCGGTTCCAGCCCCGCGGGCTGGAGG 
GCCCCGGTGGCCTGGAAGGTCGGCTTCAGGCCACAGGCCAGGCCCGGCCCCCTGCCCCCCGGCCCTTCCA 
CCATGGCCAGTATTATGGGTACCTCAGCAGCAGCAGCCCTGGGGAGGTGGAGCCGCCCCCGTTCTACGTG 
CCAGAAGTGGGCAGCCCCCTGAGCTCCGTCATGTCGTCCCCGCCCCTGCCCACCGAGGGGCCCTTTGGCC 
ACCCCACCATCCCCGAGGAGAATGGAGAGAATGCATCCAACAGCACGCTGCCCTTGACTCAGACACCTAC 
AGGAGGGCGCTCCCCTGAGCCCTGGGGCCGGCCAGAATTCCCCTTCGGGGGGCTGGAGACCCCAGCGATG 
ATGTTCCCCCACCAGCTGCCACCCTGTGATGTGCCCGAGAGTCTGCAGCCCAAGGCCGGCCTCCCCCGAG 
GACTGCCCCCCACCTCCCTGCAGGTGCCCGCGGCCTACCCGGGCATCCTGTCTCTGGAGGCACCGAAGGG 
TTGGGCAGGCAAGTCGCCCGGCAGGGGCCCTGTCCCAGCGCCCCCCGCCGCCAAGTGGCAGGACAGACCT 
ATGCAACCTCTGGTAAGCCAAGGGCAGCTGCGACATACAAGCCAAGGCATGGGCATACCTGTGCTGCCTT 
ACCCCGAGCCGGCTGAGCCGGGGGCGCACGGCGGCCCCAGCACATTTGGCCTGGACACCCGGTGGTATGA 
GCCCCAGCCCCGGCCCCGGCCTAGCCCTCGGCAGGCCAGGCGCGCCGAGCCCAGTTTACATCAAGTGGTG 
CTACAGCCCTCCCGGCTCTCACCTCTGACCCAAAGCCCCCTCAGCTCCCGCACCGGCTCCCCTGAGCTCG 
CCGCCCGTGCCCGGCCTCGCCCGGGCCTCCTGCAGCAGGCAGAGATGTCAGAGATCACCCTGCAGCCGCC 
GGCTGCAGTCAGCTTTTCTCGAAAGTCTACGCCGTCCACAGGCTCCCCCTCCCAGAGCAGCCGCAGTGGG 
AGTCCCAGCTACCGGCCCGCCATGGGCTTCACCACTCTGGCCACCGGCTACCCTTCCCCTCCACCCGGCC 
CCGCCCCTGCTGGGCCTGGGGACAGCTTGGACGTGTTTGGACAGACGCCTTCCCCTCGAAGGACGGGGGA 
GGAATTGCT CCGACCGGAGAC CC CACC AC C C ACGTT AC CTACTTCAGGGAAGC TGCGGAGAGAC AGACC A 
GCTCCCGCGACCAGCCCGCCTGAGAGAGCACTCTCTAAACTGTAGCAGCTG 



In a search of public sequence databases, the NOV45a nucleic acid sequence, located 
on chromsome 1 1 has 2296 of 2298 bases (99%) identical to a gb:GENBANK- 
ID:AB028953|acc:AB028953.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
KIAA1030 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV45a polypeptide (SEQ ID NO: 106) encoded by SEQ ID NO: 105 
hasl328 amino acid residues and is presented in Table 45B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV45a has a signal peptide and 
is likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
cleavage site for a NOV45a peptide is between amino acids 19 and 20. 



Table 45B. Encoded NOV45a protein sequence (SEQ ID NO: 106). 

MGLLGQDLFVTSFLCPTWPSPSGAHGLREEPEFVTARAGESWLRCDVIHPVTGQPPPYV 
VEWFKFGVPIPIFIKFGYYPPHVDPEYAGKVGAHGLREEPEFVTARAGESWLRCDVIHP 
VTGQPPP YWEWFKFGVP IPIFI KFG YYP PHVDP E YAGKVS LHDKASLRLEQVRS EDQGW 
YECKVLMLDQQYDTFHNGSWVHLTINAPPTFTETPPQYIEAKEGGSITMTCTAFGNPKPI 
VTWLKEGTLLGASGKYQVSWLGSLTVTSVSREDRGAYTCRAYSIQGEAVHTTHLLVQGP 
PFIVSPPENITWISQDALLTCRAEAYPGNLTYTWYWQDENVYFQNDLKLRVRILIDGTL 
IIFRVKPEDSGKYTCVPSNSLGRSPSASAYLTVQYPARVLNMPPVIYVPVGIHGYIRCPV 
DAEPPATWKWNKDGRPLQVEKNLGWTLMEDGSIRIEEATEEALGTYTCVPYNTLGTMGQ 
SAPARLVLKDPPYFTVLPGWEYRQEAGRELLIPCAAAGDPFPVITWRKVGKPSRSKHSAL 
PSGSLQFRALSKEDHGEWECVATNWTSITASTHLTVIGTGTSPHAPGSVRVQVSMTTAN 
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VSWEPGYDGGYEQTFSVWYGPLMKRAQFGPHDWLSLPVPPGPSWLLVDTLEPETAYQFSV 
LAQKLGTSAFSEWTVNTIjAFPITTPEPLVLVTPPRCLIANRTQQGVLLSWLPPANHSFP 
I DRY I ME FRVAERWELLDDG I PGTEGE FF AKDL S QDT WYE FRVLAVMQDL I GE P SN I AGV 
SSTDIFPQPDLTEDGLARPVLAGIVATICFLAAAILFSTLAACFVNKQRKRKLKRKKDPP 
LSITHCRKSLESPLSSGKVSPESIRTLRAPSESSDDQGQPAAKRMLSPTREKELSLYKKT 
KRAISSKKYSVAKAEAEAEATTPIELISRGPDGRFVMDPVEMEPSLKSRRIEGFPFAEET 
DMYPEFRQSDEENEDPLVPTSVAALKSQLTPLSSSQESYLPPPAYSPRFQPRGLEGPGGL 
EGRLQATGQARPPAPRPFHHGQYYGYLSSSSPGEVEPPPFYVPEVGSPLSSVMSSPPLPT 
EGPFGHPTIPEENGENASNSTLPLTQTPTGGRSPEPWGRPEFPFGGLETPAMMFPHQLPP 
CDVPESLQPKAGLPRGLPPTSLQVPAAYPGILSLEAPKGWAGKSPGRGPVPAPPAAKWQD 
RPMQPLVSQGQLRHTSQGMGIPVLPYPEPAEPGAHGGPSTFGLDTRWYEPQPRPRPSPRQ 
ARRAEPSLHQWLQPSRLSPLTQSPLSSRTGSPELAARARPRPGLLQQAEMSEITLQPPA 
AVSFSRKSTPSTGSPSQSSRSGSPSYRPAMGFTTLATGYPSPPPGPAPAGPGDSLDVFGQ 
TPSPRRTGEELLRPETPPPTLPTSGKLRRDRPAPATSPPERALSKL 



A search of sequence databases reveals that the NOV45a amino acid sequence has 761 
of 763 amino acid residues (99%) identical to, and 761 of 763 amino acid residues (99%) 
similar to, the 763 amino acid residue ptnr:SPTREMBL-ACC:Q9UPX0 protein from Homo 
sapiens (Human) (KIAA1030 PROTEIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV45a is expressed in at least brain, cerebral medulla/cerebral white matter, prostate, 
thalamus, placenta. Expression information was derived from the tissue sources of the 
sequences that were included in the derivation of the sequence of CG57656-01 .The sequence 
is predicted to be expressed in the following tissues because of the expression pattern of 
(GENBANK-ID: gb:GENBANK-ID:AB028953|acc:AB028953.1) a closely related Homo 
sapiens mRNA for KIAA1030 protein, partial cds homolog in species Homo sapiens rliver. 
This information was derived by determining the tissue sources of the sequences that were 
included in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 



NOV45b 

A disclosed NOV45b nucleic acid of 7097 nucleotides (also referred to as CG57656- 
02) encoding a Ig/fibronectin -like protein is shown in Table 45C. The start and stop codons 
are in bold letters. 



Table 45C. NOV45b nucleotide sequence (SEQ ID NO: 107). 

ATGGGGCTGCTGGGGCAGGATCTCTTTGTCACCTCCTTTCTGTGTCCAACCTGGCCGTCC 
CCATCAGGCGCCCACGGCCTGCGAGAGGAACCCGAGTTTGTGACGGCAAGAGCTGGGGAG 
AGCGTGGTCCTGCGATGCGACGTGATCCACCCAGTGACGGGACAGCCCCCACCCTATGTC 
GTAGAGTGGTTCAAGTTCGGGGTCCCCATCCCTATCTTCATCAAGTTTGGCTACTACCCC 
CCACACGTGGACCCTGAGTATGCAGGCCGGGCCAGTCTTCATGATAAGGCATCTCTGCGG 
CTGGAACAAGTTCGCTCTGAGGACCTGGGCTGGTATGAGTGCAAAGTGCTCATGCTGGAC 
CAGCAGTATGACACCTTCCACAATGGCAGCTGGGTCCACCTCACCATCAACGCCCCTCCC 
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ACCTTTACAGAAACACCCCCCCGGTACATCGAGGCCAAGGAGGGTGGTAGTATCACCATG 
ACCTGCACAGCTTTTGGGAACCCCAAGCCCATTGTCACCTGGCTCAAGGAGGGGACGCTC 
CTCGGTGCTAGTGGGAAATACCAGGTGAGTGACGGCAGCCTGACAGTGACATCGGTCAGT 
CGGGAGGACAGAGGTGCCTACACCTGCCGAGCGTACAGCATTCAGGGGGAGGCTGTCCAC 
ACGACTCACCTGCTTGTCCCAGGGCCCCCTTTCATCGTCTCCCCTCCTGAGAACATCACC 
GTCAACATCTCCCAGGATGCTCTGCTCACCTGCCGGGCAGAGGCGTATCCGGGCAACCTC 
ACCTACACCTGGTACTGGCAGGACGAGAACGTCTACTTTCAGAACGACCTGAAGCTGAGG 
GTGCGCATCCTAATCGATGGGACCCTGATCATCTTCCGGGTGAAGCCGGAGGACTCGGGG 
AAGTACACCTGTGTGCCCAGCAACAGCCTGGGGCGCTCCCCCTCCGCCTCGGCGTACCTG 
ACCGTGCAGTACCCAGCGCGTGTCCTCAACATGCCCCCTGTGATTTACGTGCCCGTGGGG 
ATCCATGGCTACATCCGCTGCCCTGTGGACGCAAGACCACCGGCCACCGTGGTCAAGTGG 
AACAAGGACGGCCGTCCCCTGCAGGTTGAGAAGAACCGCGGTTGGACCCTGATGGAGGAT 
GGCTCCATTCGAATTGAGGAGGCCACAGAGGAGGCTCTTGGCACTTATACCTGTGTGCCT 
TACAACACTCTGGGGACCATGGGCCAGTCTGCCCCTGCGAGGCTTGTCCTGAAGGACCCC 
CCCTATTTCACGGTGCTACCAGGCTGGGAGTACAGGCAGGAGGCCGGCCGGGAGCTACTT 
ATCCCCTGTGCTGCCGCAGGGGACCCCTTTCCTGTCATCACTTGGAGCAAGGTAGGGAAG 
CCCAGCAGAAGCAAGCACAGTGCCCTGCCCAGTGGGAGCCTGCAGTTCCGTGCCCTGAGT 
AAGGAGGACCACGGGGAGTGGGAATGTGTCGCCACCAACGTGGTCACGAGCATCACTGCC 
AGCACCCACCTCACCGTCATCGGCACCAGCCCCCATGCCCCGGGCAGTGTCCGGGTCCAG 
GTCTCCATGACAACTGCCAACGTGTCCTGGGAACCAGGTGACGGGCTACGATGGGGCTAT 
GATGGAGGCTACGAGCAGACATTCTCAGTTTGGATGAAGCGGGCACAGTTTGGGCCCCAT 
GACTGGCTGTCCTTGCCAGTGCCGCCAGGACCCAGCTGGCTGCTGGTGGACACCCTGGAG 
CCTGAGACAGCGTACCAGTTCAGCGTCCTGGCCCAGAACAAGCTGGGAACCAGCGCCTTC 
AGTGAGGTGGTCACTGTGATCACTTTAGCATTCCCTATTACAACTCCAGAACCCCTGGTG 
CTGGTCACCCCACCGAGGTGCCTCATAGCCAATCGGACTCAGCAGGGTGTGCTCCTGTCC 
TGGCTTCCGCCTGCCAACCACAGCTTTCCCATCGACCGCTACATCATGGAGTTCCGTGTC 
GCAGAGCGCTGGGAGTTGCTCGACGATGGCATCCCCGGCACCGAAGGAGAGTTCTTTGCC 
AAGGATCTGTCACAGGACACGTGGTATGAGTTCCGGGTTCTGGCCGTCATGCAGGATCTG 
ATCGGCGAGCCCAGCAACATCGCCGGCGTCTCCAGCACAGACATCTTCCCGCAGCCGGAC 
CTGACCGAGGATGGGCTGGCGCGGCCTGTGCTGGCGGGAATCGTAGCTACCATCTGCTTC 
TTGGCAGCTGCCATCCTGTTCAGCACCCTGGCTGCCTGCTTTGTCAACAAGCAGCGCAAG 
CGTAAGCTCAAGCGCAAAAAAGACCCTCCACTCTCCATCACCCACTGCAGGAAGAGCCTG 
GAGTCTCCCTTGTCCTCTGGCAAGGTGAGCCCCGAGAGCATCCGCACGCTCCGAGCGCCG 
TCAGAATCCTCCGACGACCAGGGCCAGCCCGCGGCCAAGAGGATGCTGAGCCCCACCCGT 
GAGAAGGAGCTGTCGCTGTACAAGAAGACCAAGCGGGCCATCAGCAGCAAGAAGTACAGC 
GTGGCCAAGGCAGAGGCCGAGGCAGAGGCCACCACGCCCATCGAGCTCATCAGCAGAGGC 
CCTGACGGCCGCTTCGTGATGGACCCTGTCGAGATGGAGCCCTCGCTGAAGAGCAGGCGC 
ATCGAGGGCTTCCCCTTCGCCGAGGAGACGGACATGTACCCCGAGTTCCGCCAGTCGGAC 
GAGGAGAACGAGGACCCACTGGTGCCCACATCTGTGGCCGCCCTGAAGTCCCAGCTCACC 
CCTCTGTCATCCAGCCAGGAGTCCTACCTGCCACCACCAGCATACAGCCCTCGGTTCCAG 
CCCCGCGGGCTGGAGGGCCCCGGTGGCCTGGAAGGTCGGCTTCAGGCCACAGGCCAGGCC 
CGGCCCCCTGCCCCCCGGCCCTTCCACCATGGCCAGTATTATGGGTACCTCAGCAGCAGC 
AGCCCTGGGGAGGTGGAGCCGCCCCCGTTCTACGTGCCAGAAGTGGGCAGCCCCCTGAGC 
TCCGTCATGTCGTCCCCGCCCCTGCCCACCGAGGGGCCCTTTGGCCACCCCACCATCCCC 
GAGGAGAATGGAGAGAATGCATCCAACAGCACGCTGCCCTTGACTCAGACACCTACAGGA 
GGGCGCTCCCCTGAGCCCTGGGGCCGGCCAGAATTCCCCTTCGGGGGGCTGGAGACCCCA 
GCGATGATGTTCCCCCACCAGCTGCCACCCTGTGATGTGCCCGAGAGTCTGCAGCCCAAG 
GCCGGCCTCCCCCGAGGACTGCCCCCCACCTCCCTGCAGGTGCCCGCGGCCTACCCGGGC 
ATCCTGTCTCTGGAGGCACCGAAGGGTTGGGCAGGCAAGTCGCCCGGCAGGGGCCCTGTC 
CCAGCGCCCCCCGCCGCCAAGTGGCAGGACAGACCTATGCAACCTCTGGTAAGCCAAGGG 
CAGCTGCGACATACAAGCCAAGGCATGGGCATACCTGTGCTGCCTTACCCCGAGCCGGCT 
GAGCCGGGGGCGCACGGCGGCCCCAGCACATTTGGCCTGGACACCCGGTGGTATGAGCCC 
CAGCCCCGGCCCCGGCCTAGCCCTCGGCAGGCCAGGCGCGCCGAGCCCAGTTTACATCAA 
GTGGTGCTACAGCCCTCCCGGCTCTCACCTCTGACCCAAAGCCCCCTCAGCTCCCGCACC 
GGCTCCCCTGAGCTCGCCGCCCGTGCCCGGCCTCGCCCGGGCCTCCTGCAGCAGGCAGAG 
ATGTCAGAGATCACCCTGCAGCCGCCGGCTGCAGTCAGCTTTTCTCGAAAGTCTACGCCG 
TCCACAGGCTCCCCCTCCCAGAGCAGCCGCAGTGGGAGTCCCAGCTACCGGCCCGCCATG 
GGCTTCACCACTCTGGCCACCGGCTACCCTTCCCCTCCACCCGGCCCCGCCCCTGCTGGG 
CCTGGGGACAGCTTGGACGTGTTTGGACAGACGCCTTCCCCTCGAAGGACGGGGGAGGAA 
TTGCTCCGACCGGAGACCCCACCACCCACGTTACCTACTTCAGGGAAGCTGCAGAGAGAC 
AGACCAGCTCCCGCGACCAGCCCGCCTGAGAGAGCACTCTCTAAACTGTAG CAGCTGGTA 
TTCCAGCTATCTGGGCAGTGTTGTCGAGACAAGCCTCTCCTCAGCTCAATAGGTAAGGGA 
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GCTCCTCG GCTGGGCGGGCGGGG GCGGGCAGGCGGACGGGGCTTCGGCCGGGCCATTGCT 
TCCTGGACAGGGGATCCAAACCATGTCCCCTCACCGCCCCGTGGGGTGGCCGCTGCCGCT 
CCTCGATCCCCACGGCTTCTGGGTTCCCCACATCGAGCCACGCTCGGCACCGCCTAGCTG 
CAGCTTCTGCCCCCCACCCCCCGTCCCCATGTCCGGC CCCTCTAGAGCCCCCT GCGGCTC 
TCCTCACTCCTCGCCAGCCTCGCCAGCCTGCTGCCAGTACACCCAGGGCCTCCCACAGAA 
GCCCTGGGGCCCTGCATGCACCCCGAAGGGGCCCAGAACACCCGAGATCTGCTTTGCATC 
TCTGCACCCTTGGGGACCTCTCTGGGGCCCCTCTGGTATCCAGGAAGAGCCCCCCCGCAC 
CCTATTCTCCCAGCA GCCCCAG GCATAACATCCCTTCTCCTGGGG TGGCTGGCCCCTCAC 
CAGCTTCTTGAGATCTTCGTGGTATTAGG CTCTCTCGAGGAGTACAAGGTTGGGATGAGC 
CCCACTTCTTCCTTGACAGTGGG GTCCTCTGCAGGGTCAGGTCCCTG CAGTCCTGTGGGG 
CTTTGTAGCGAAATGTCATCACTC CCCTGTGCTCCTTGCCTTCTGCAGCCTGACCTTGGT 
AGTCACAGGACTGAAATGTGACCTAGCCCTTGGGTCCACATTGCTTT CAAGGTCCCTGTT 
GTAGCCTTGTCGTCCTCCCTGGGTCATGCT GTAGGGAAAGGGTCTTAGGAGGACCTCCCC 
GAGGGGAGGGGGCAG GCTTCCC CTGGGCAGACAGGCATTGAGCTG GCAGGAAGTGAAACC 
CCCAGGGACCAGCACTGTGTCCCTCCCCCTGCCCCCGCAGTCAGCCTCTCCTTGAGTGTC 
CTTCCAGGTGGAAGCTAAAGGAACTTGGGCATTCAGGGAGCCTCTGA GTCCCTCACAGAG 
ACTCAGGGCCACCCGAAGCTCGCTCCCTC TCACAG TAGCCTGACAAGGTGCTGCCGCTCG 
CCACGGAC CCTCTGCCCTGTGC CTGGGCACACACAGGCATCGGGCACCTGCATGGGAGAC 
GGCGAGCCTCCCGTCCAGGTGCTTCTGGC TTTCCAGGCGAGAAGGAGACAGGTGCCTTCC 
CCCCCTAGAGATGTCAAGGGAGGTTCACTTTCCTAACCAGGGCTATAAATCTCATTCATT 
CCTAAGAGTGGCCTCCATAAAGAGGACTGCCCTGAC ATTTCTTTACCATCTGTAGCTATA 
AAATTGTCAGCAGCCCAGAGGCC CTGGAA TTGTGGGTGAGGCTGTCTGGCATAGGGTGAC 
TGTGAGGGCTGTCAGGCAGCGTT TATGAGCTCACCTG CTCTGGGGCCCTCCTGCGCCCAG 
AGACCCACTGGTCTTCCACCTTCC CCAGCTCCCTCCCT GTCCCAGCTGTGACTCCTCCAC 
AGCCCCCGGGAGGTGTGGGAGCCCTAGACTAGCTCTCACCCCAGCTC CTGGAAATTCCTA 
GACTCTGCTCTCATGTGTTATTCCTCACCCCTTCACTCAACTCCATTGACCCTCCCCTTT 
CCCAGTGTCCCCACTGTGCCAGATCCAAGAGAAGCCCGCTCTCCTTTCACTGCTGTGGCA 
AACCCAG AAACCAGGGGCAGCGATGAGGGACCATCGTTTCTCTCCCG GGCCTGGCCAGCA 
TCCCCAGC CTAGGAGAAGGAGACCCTCCCCATCCTGA GTCAGCCCCTCGGTGCTGGCCTC 
TCCTGCCT GCTGGGAGCCTCCCCGGTACGGCTGCTGGG TCCTGGGGAGATGCAGGCTCTG 
TTCAGATGCTGTCTCCATGCTGCACCTTT GCATGTGTGCCCTCTTGGTCTTGCTTGGAGA 
AGTTGTGACAGCTCTTGTCCACG TGCGTT CTCACTGTTTCTTTTCCCTTCAGTTCCTCTC 
CTGTTCCTTTGGTCCATATGT AGTTGTTGCCCCTGCCTCTCCTTTTCTCT CTCCCTTTTT 
TTCTCCTCTTCCCAACTCCCTTCTCAACATGGGGAGTTGTATTGC GTGGACCTGTCCTTT 
CTGTGGACTGTGGGGAGGCCGAGCCAATT CTTAGCTCCTGTCTGAGCCTAGTGATGCCCT 
GTCCTCACCTCTGCTTCCACTCCCTTTCCCA TTCAGGC TGACTGTCACCGCAGGGAGGCA 
GGTGGGTCTGGAGCATGGAGTGGCCCCTGTCCTGGGTC AGCTGTAGGAAGGGGCCCCTTT 
GTAAACA AGCTTCTGCCAGGTTC TGAGGC AGAGCACTGGTCTGAGTCCGGCCCCACAGAA 
GGGTTCCATCTTGTGGTCAGCGGGGTGG GCTCAGACCTGCAGACTCCCGCAGTGGGCCCA 
CCCTTTCCCTTGCTATGGCCCGTCTATGAG TGGCTTTCCCCTGCCTCCATCGGTGTATAA 
TCAGGCAGGTGGTGT TTGCTCTTAGGTGATTAGAAGA GGAGGAAAATGCCCATCAAGCCA 
CTGCTTCAGCCTGTTGGCCAGGGAACACCTGAACAGGGTCAAAATGG TCAGCTCCTGGGC 
GCTGAGCTAAGGAAATATGCAGGGGGTTTTGACTTTCCCTGTCCTGAGGCTTGCCTCCAC 
TTTGACCTGGCTGCATCTTTCCTC AGCTCTTCTGGGTA AGCTGAGAGCTAAGCAAATAAG 
ACCTCCTGGGCTACCTGGTCATGATTTGGGTTTTGTAT GGATTTTTTGAAAAGAGAGAAA 
AAGAGTGTGGGCTGGGTGCCGTGGCCCAGA GGGCAGATCTCTTGAGTCCAGGAGTTCAAG 
GCCAGCCTGGGCAAC ATAGTGA AGCCCCATCTCTGCAAAAAAATACAAAAAAATAGCTGG 
GCATGGTGGCACAT GCCTATAGTCCCAGCTCCTTGGGGGGCTGAGGTGGGAAGATCATCT 
GAGCTGGGGAGGTCGAGGCAGC AGTGAGCCGAGATTGCGCCACTTCACTCCAGCCTGGGT 
GACAGAGATCCTGTCTC 



In a search of public sequence databases, the NOV45b nucleic acid sequence, located 
on chromsome 1 1 has 5312 of 5319 bases (99%) identical to a gb.GENBANK- 
ID:AB028953|acc:AB028953.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
5 KIAA1030 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 
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The disclosed NOV45b polypeptide (SEQ ID NO: 108) encoded by SEQ ID NO: 107 
has 1356 amino acid residues and is presented in Table 45D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV 45b has a signal peptide and 
is likely to be localized at the plasma membrane with a certainty of 0.4600. The most likely 
cleavage site for a NOV45b peptide is between amino acids 19 and 20 . 



Table 45D. Encoded NOV45b protein sequence (SEQ ID NO: 108). 



MGLLGQDLFVTSFLCPTWPSPSGAHGLREEPEFVTARAGESWLRCDVIHPVTGQPPPYV 
VE WFKFGVP I P I FIKFGYYP PHVD PE YAGRAS LHDKAS LRLEQ VRS EDLGW YECKVLMLD 
QQYDTFHNGSWVHLTINAPPTFTETPPRYIEAKEGGSITMTCTAFGNPKPIVTWLKEGTL 
LGASGKYQVSDGSLTVTSVSREDRGAYTCRAYSIQGEAVHTTHLLVPGPPFIVSPPENIT 
VNI SQDALLTCRAEAYPGNLT YTW YWQDENVYFQNDLKLRVR I L I DGTL I I FRVKPEDSG 
KYTC VP SNS LGRS PS ASAYLT VQ YPAR VLNM P P VI YVP VG I HG Y I RCP VDARP PAT WKW 
NKDGRPLQVEKNRGWTLMEDGSIRIEEATEEALGTYTCVPYNTLGTMGQSAPARLVLKDP 
PYFTVLPGWEYRQEAGRELLIPCAAAGDPFPVITWSKVGKPSRSKHSALPSGSLQFRAL.S 
KE DHGE WE C VATNWT S I TAS THLT V I GT S PHAPG S VR VQ V S MT T ANVS W E P GDGLRWG Y 
DGGYEQTFSVWMKRAQFGPHDWLSLPVPPGPSWLLVDTLEPETAYQFSVIiAQNKLGTSAF 
SEWTV I TLAFP I TTPE PL VL VTPPRCL I ANRTQQGVLLS WL P PANHS FP I DRY I ME FRV 
AERWELLDDG I PGTEGEFFAKDLSQDTWYEFRVLAVMQDL IGE PSN I AGVS STD I FPQPD 
LTEDGLARPVIAGIVATICFLAAAILFSTLAACFVNKQRKRKLKRKKDPPLSITHCRKSL 
ESPLSSGKVSPESIRTLRAPSESSDDQGQPAAKRMLSPTREKELSLYKKTKRAISSKKYS 
VAKAEAEAEATTPIELISRGPDGRFVMDPVEMEPSLKSRRIEGFPFAEETDMYPEFRQSD 
EENEDPLVPTSVAALKSQLTPLSSSQESYLPPPAYSPRFQPRGLEGPGGLEGRLQATGQA 
RPPAPRPFHHGQYYGYLSSSSPGEVEPPPFYVPEVGSPLSSVMSSPPLPTEGPFGHPTIP 
EENGENASNSTLPLTQTPTGGRSPEPWGRPEFPFGGLETPAMMFPHQLPPCDVPESLQPK 
AGLPRGLPPTSLQVPAAYPGILSLEAPKGWAGKSPGRGPVPAPPAAKWQDRPMQPLVSQG 
QLRHTSQGMGIPVLPYPEPAEPGAHGGPSTFGLDTRWYEPQPRPRPSPRQARRAEPSLHQ 
WLQPSRLSPLTQSPLSSRTGSPEIAARARPRPGLLQQAEMSEITLQPPAAVSFSRKSTP 
STGSPSQSSRSGS PS YRPAMGFTTLATGYPSPPPGPAPAGPGDSLDVFGQTPSPRRTGEE 
LLRPETPPPTLPTSGKLQRDRPAPATSPPERALSKL 



A search of sequence databases reveals that the NOV45b amino acid sequence has 425 
of 91 1 amino acid residues (46%) identical to, and 553 of 91 1 amino acid residues (60%) 
similar to, the 1 189 amino acid residue ptnr:SPTREMBL-ACC:Q9P2J2 protein from Homo 
sapiens (Human) (KIAA1355 PROTEIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV45b is expressed in at least prostate, brain (cerebellum). This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV45a polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 45E. 
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Table 45E. BLAST results for NOV45 



VJC11C J-lt\U~J\./ 

Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
<%) 


Positives 
(%) 


Expect 


gi| 5689397 |dbj | BAA8 


KIAA1030 protein 
[Homo sapiens] 


763 


761/763 
(99%) 


761/763 
(99%) 


0.0 


2982. l| (AB028953) 


gi | 1531104 6 | ref |XP 


KIAA103 0 protein 
[Homo sapiens] 


747 


583/627 
(92%) 


583/627 
(92%) , 


0.0 


027486 .2 | 
(XM_0 2 7486) 


gi|l8578690|ref |XP 


similar to KIAA1355 
protein [Homo sapiens] 


904 


413/472 
(87%) 


417/472 
(87%) 


0 . 0 


062186.2| 
(XM_062186) 


gi| 7243091 | db j | BAA9 


KIAA1355 protein 
[Homo sapiens] 


1189 


410/882 
(46%) 


527/882 
(59%) 


e-179 


2593. l| (AB037776) 


gi|l8426807|ref | NP 


neural cell 
adhesion molecule 
(Ncam) -like ; 
KIAA1355 
hypothetical 
protein (human) ; 
NCAM-like protein 
NRTl [Mus 
musculus] 


1179 


409/866 
(47%) 


525/866 
(60%) 


e-177 


291086. l| 
(NM_033608) 



Tables 45F-H list the domain descriptions from DOMAIN analysis results against 
NOV . This indicates that the NOV sequence has properties similar to those of other proteins 
known to contain this domain. 

5 

Table 45F. Domain Analysis of NOV45 

gnl 1 Smart | smar t00409 , IG, Immunoglobulin 

CD-Length = 86 residues, 100.0% aligned 

Score = 62.4 bits (150), Expect = 2e-10 



Table 45G. Domain Analysis of NOV45 

gnl | Smart | smart 0 04 08 , IGc2, Immunoglobulin C-2 Type 

CD-Length = 63 residues, 92.1% aligned 

Score = 54.7 bits (130), Expect = 4e-08 



Table 45H. Domain Analysis of NOV45 

gnl 1 Pf am | pf am00041 , fn3, Fibronectin type III domain. 

CD-Length = 86 residues, 90.7% aligned 

Score = 43.5 bits (101), Expect = 8e-05 

The basic structure of immunoglobulin (Ig) molecules is a tetramer of two light chains 
10 and two heavy chains linked by disulfide bonds. There are two types of light chains: kappa and 
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lambda, each composed of a constant domain (CL) and a variable domain (VL). There are five 
types of heavy chains: alpha, delta, epsilon, gamma and mu, all consisting of a variable 
domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant 
domains (CHI to CH4). The major histocompatibility complex (MHC) molecules are made of 
5 two chains. In class I the alpha chain is composed of three extracellular domains, a 

transmembrane region and a cytoplasmic tail. The beta chain (beta-2- microglobulin) is 
composed of a single extracellular domain. In class II, both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. It is 
known that the Ig constant chain domains and a single extracellular domain in each type of 

10 MHC chains are related. These homologous domains are approximately one hundred amino 
acids long and include a conserved intradomain disulfide bond. Members of the 
immunoglobulin superfamily are found in hundreds of proteins of different functions. 
Examples include antibodies, the giant muscle kinase titin and receptor tyrosine kinases. 
Immunoglobulin-like domains may be involved in protein-protein and protein-ligand 

15 interactions. 

Fibronectins are multi-domain glycoproteins found in a soluble form in plasma, and in 
an insoluble form in loose connective tissue and basement membranes. They contain multiple 
copies of 3 repeat regions (types I, II and III), which bind to a variety of substances including 
heparin, collagen, DNA, actin, fibrin and fibronectin receptors on cell surfaces. The wide 

20 variety of these substances means that fibronectins are involved in a number of important 
functions: e.g., wound healing; cell adhesion; blood coagulation; cell differentiation and 
migration; maintenance of the cellular cytoskeleton; and tumour metastasis . The role of 
fibronectin in cell differentiation is demonstrated by the marked reduction in the expression of 
its gene when neoplastic transformation occurs. Cell attachment has been found to be 

25 mediated by the binding of the tetrapeptide RGDS to integrins on the cell surface , although 

related sequences can also display cell adhesion activity. Plasma fibronectin occurs as a dimer 
of 2 different subunits, linked together by 2 disulphide bonds near the C -terminus. The 
difference in the 2 chains occurs in the type III repeat region and is caused by alternative 
splicing of the mRNA from one gene . The observation that, in a given protein, an individual 

30 repeat of one of the 3 types (e.g., the first Fnlll repeat) shows much less similarity to its 
subsequent tandem repeats within that protein than to its equivalent repeat between 
fibronectins from other species, has suggested that the repeating structure of fibronectin arose 
at an early stage of evolution. It also seems to suggest that the structure is subject to high 
selective pressure. The fibronectin type III repeat region is an approximately 100 amino acid 
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domain, different tandem repeats of which contain binding sites for DNA, heparin and the cell 
surface . The superfamily of sequences believed to contain Fnlll repeats represents 45 
different families, the majority of which are involved in cell surface binding in some manner, 
or are receptor protein tyrosine kinases, or cytokine receptors. 

5 The disclosed NOV45 nucleic acid of the invention encoding a Ig/fibronectin domain- 

like protein includes the nucleic acid whose sequence is provided in Table 45 A or 45C or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 45A or 45C while still 
encoding a protein that maintains its Ig/fibronectin domain-like activities and physiological 

1 0 functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 

1 5 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 percent of the bases may be so changed. 

20 The disclosed NOV45 protein of the invention includes the Ig/fibronectin domain-like 

protein whose sequence is provided in Table 45B or 45D. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 45B or 45D while still encoding a protein that maintains its 
Ig/fibronectin domain-like activities and physiological functions, or a functional fragment 

25 thereof. In the mutant or variant protein, up to about 54 percent of the residues may be so 
changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Ig/fibronectin 
30 domain-like protein (NOV45) may function as a member of a "Ig/fibronectin domain family". 
Therefore, the NOV45 nucleic acids and proteins identified here may be useful in potential 
therapeutic applications implicated in (but not limited to) various pathologies and disorders as 
indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 

257 



^ ijU*P il^^KL. -S3*»- '.U»iP ill ?U^1W» *W H** 



diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 
therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of 
all tissues and cell types composing (but not limited to) those defined here. 

The NOV45 nucleic acids and proteins of the invention are useful in potential 
5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Ig/fibronectin domain- 
like protein (NOV45) may be useful in gene therapy, and the Ig/fibronectin domain-like 
protein (NOV45) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 

10 treatment of patients suffering from Von Hippel-Lindau (VHL) syndrome, Alzheimer's 

disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, 
leukodystrophies, behavioral disorders, addiction, anxiety, pain, neuroprotection, fertility, or 
other pathologies or conditions. The NOV45 nucleic acid encoding the Ig/fibronectin domain- 

1 5 like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV45 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV45 substances for use in 

20 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV45 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

25 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV46 

A disclosed NOV46 nucleic acid of 1247 nucleotides (also referred to as CG57682-01) 
encoding a G2/mitotic-specific cyclin B2-like protein is shown in Table 46A. The start and 
30 stop codons are in bold letters. 



Table 46A. NOV46 nucleotide sequence (SEQ ID NO:109). 



ATCCACATTGCATTTAGTTGTCAGGTATGCTATTAGGACTCTCCAAAGAGACAGAACCAATAGGGGAAAT 
GGGTATATGCAGAGACAGAGAGACAGGAGTTAATTCTAAACCTAAGAGTCATGTGACTATTAGGCATTCC 
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ATTTTAGAAAAAATTGGAAATAGAGTTACAGCCAGAGCAGCACAAGTAGCTAAGAAAGCTCAGAACACAC 
AAAGTGCCAGTTCAACCCAGGGAAACAACAAATGTCAACAAACAACTGAAACCTACTGCTTCTGTGAAAC 
AGTACAGATGGAAATGTTGGCTCCAAAGGGTCCTTCTCCCATACCTGAGGATGCCTCCGTGAAAGAAGAG 
AACATCTGCCGAGCTTTTTCTGATGCTTTGCTCTACAAAATTGAGGATATTGATAACAAAGATTGGAATA 
AC C CTCAGC T CTGCAGTGACT ATTT AAGGAAGGGT ATC T AT C AGT ACC T C AGGC AGCTGG AGATTT TG CA 
GTTCATAAACCCACATGTCTTAGGTGGAGGAGATGTAAATGGACATAAGCATACCATCCTGGTAGACTGG 
TTGGTGCAAATCCACTCCAAGTTTAGGCTTCTTCAGGAGACTCTGTATGTGTGTGTTGCCATTATGGATG 
GATCTTTACTGGTTCAGCCAGTTTCCCAGAGGAAGCTTCAACTAGTTTGGATTACTGCTCTGCTCTTGGC 
TTCCAAGTATGAGGAGATGTTTTCTCCAAATACTGAAGGCTTTGTTTACATCACAGACAATGCTTATACT 
AGTTTCCAAATCCAAGAAATGGAAACTCTAATTTTGAAAGAACTGAAATTTGAGGTGGGTGGACCCTTGC 
CAC TACACT T CTTAAGG CAAG CAT CAAAAG C CGGGAAGG CTGATG T TGAACAGCACAC TTTAG C CAAATA 
TTTGATGGAGCTGACTCTCATTGACTACGATATGATGCATTATCATCCTTCTAAGGCAGCAACAGCTGCT 
T C CTG CTTGT CTCAGAAGGTT CTGGG C CAAGGAAAATGGAACTTAAAG CAG CAGTGTTAT ACAGGATACA 
CACAGAATGAAGTATTGGAAGTCATGCAGCACGTGGCCAAAAATGTGCTGAAAGTAAATGAAAACTTAAC 
TAAATTCATCGCCATCAAG AATAAGTATGCAAG CAACAAATTCCTGAAGAT CAG CATGATCCCTCAGCTG 
AACTCAAAAGCCATCAAAGACCTTGCCTTCCCTCTGATGGGAGGGTCCTAGGCTGCA 



In a search of public sequence databases, the NOV46 nucleic acid sequence, located on 
chromsome 7 has 1068 of 1 172 bases (91%) identical to a gb:GENBANK- 
ID:HSM800659|acc:AL080146.1 mRNA from Homo sapiens (Homo sapiens mRNA; cDNA 
5 DKFZp434B174 (from clone DKFZp434B174); complete cds). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV46 polypeptide (SEQ ID NO:l 10) encoded by SEQ ID NO: 109 has 
404 amino acid residues and is presented in Table 46B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV46 has a signal peptide and is 
10 likely to be localized in the cytoplasm with a certainty of 0.6500. 



Table 46B. Encoded NOV46 protein sequence (SEQ ID NO:110). 

MLLGLSKETEPIGEMGICRDRETGVNSKPKSHVTERHSILEKIGNRVTARAAQVAKKAQN 

TQSASSTQGNNKCQQTTETYCFCETVQMEMLAPKGPSPIPEDASVKEENICRAFSDALLY 

KIEDIDNKDWNNPQLCSDYLRKGIYQYLRQLEILQFINPHVLGGGDVNGHKHTILVDWLV 

QIHSKFRLLQETLWCVAIMDGSLLVQPVSQRKLQLVWITALLIJkSKYEEMFSPNTEGFV 

YITDNAYTSFQIQEMETULKELKFEVGGPLPLHFLRQASKAGKADVEQHTUKYLMELT 

QDYDMMHYHPSKMTMSCLSQKVLGQGKWNLKQQCYTGYTQNEVLEVMQHVAKNVLKV 

NENLTKFI AI KNK Y ASNKFLKI SM1PQLN SK AIKDL AF PLM G G S 



A search of sequence databases reveals that the NOV46 amino acid sequence has 303 
of 383 amino acid residues (79%) identical to, and 333 of 383 amino acid residues (86%) 
1 5 similar to, the 398 amino acid residue ptnr:SWISSNEW-ACC:O95067 protein from Homo 
sapiens (Human) (G2/MITOTIC-SPECIFIC CYCLIN B2). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV46 is expressed in at least Adrenal gland, Aorta, B-cells, Blood, Bone, Brain, 
Breast, CNS, Colon, Ear, Esophagus, Eye, Gall bladder, Germ Cell, Head and neck, Heart, 
20 Kidney, Larynx, Liver, Lung, Lymph, Marrow, Muscle, Neural, Omentum, Ovary, Pancreas, 
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Parathyroid, Peripheral nervous system, Placenta, Pooled, Prostate, Skin, Small intestine, 
Spleen, Stomach, Synovial membrane, Testis, Tissue culture, Tonsil, Uterus, Whole embryo, 
and adrenal gland. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
5 Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV46 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 46C. 



Table 46C. BLAST results for NOV46 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi|4757930|ref | NP 0 
04692 . 1 | 
(NM_004701) 


cyclin B2 [Homo 
sapiens] 


398 


303/387 
(78%) 


333/387 
(85%) 


e-67 


qi|5921730|sp|O7768 
9|CGB2 BOVIN 


G2/mitotic- 
specific cyclin 
B2 


398 


285/385 
(74%) 


328/385 
(85%) 


e-62 


gi|584914|sp|P37883 
|CGB2 MESAU 


G2/mitotic- 
specific cyclin 
B2 


397 


288/385 
(74%) 


321/385 
(82%) 


e-58 


gi | 141983 71 |gfo| AAHO 

8247.1|AAH08247 

(BC008247) 


Similar to cyclin 
B2 [Mus musculus] 


398 


278/382 
(72%) 


316/382 
(81%) 


e-51 


gi | 6680866 | ref |NP 0 
31656 .l| 
(NM 007630) 


cyclin B2 [Mus 
musculus] 


398 


274/382 
(71%) 


313/382 
(81%) 


e-49 



10 

Tablez 46D-E list the domain descriptions from DOMAIN analysis results against 
NOV46. This indicates that the NOV46 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 46D. Domain Analysis of NOV46 



gnl | Pf a m | pf am0 0134 , cyclin, Cyclin, N-terminal domain. Cyclins 
regulate cyclin dependent kinases (CDKs) . Cyclins contain two domains 
of similar all-alpha fold, of which this family corresponds with the 
N-terminal domain. 



CD-Length = 128 residues, 99.2% aligned 

Score =119 bits (298), Expect = 3e-28 



15 
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Table 46E. Domain Analysis of NOV46 

gnl 1 Smart 1 smar t 0 03 85, CYCLIN, domain present in cyclins, TFIIB and 
Retinoblastoma; A helical domain present in cyclins and TFIIB (twice) 
and Retinoblastoma (once) . A protein recognition domain functioning in 
cell-cycle and transcription control . 

CD-Length = 83 residues, 100.0% aligned 

Score = 62.8 bits (151), Expect = 4e-ll 

Two B-type cyclins, Bl and B2, have been identified in mammals. Proliferating cells 
express both cyclins, which bind to and activate p34(cdc2). To test whether the two B-type 
cyclins have distinct roles, lines of transgenic mice were generated, one lacking cyclin Bl and 
5 the other lacking cyclin B2. Cyclin Bl proved to be an essential gene; no homozygous Bl-null 
pups were born. In contrast, nullizygous B2 mice developed normally and did not display any 
obvious abnormalities. Both male and female cyclin B2-null mice were fertile, which was 
unexpected in view of the high levels and distinct patterns of expression of cyclin B2 during 
spermatogenesis. The expression of cyclin Bl overlaps the expression of cyclin B2 in the 
10 mature testis, but not vice versa. Cyclin Bl can be found both on intracellular membranes and 
free in the cytoplasm, in contrast to cyclin B2, which is membrane-associated. These 
observations suggest that cyclin Bl may compensate for the loss of cyclin B2 in the mutant 
mice, and implies that cyclin Bl is capable of targeting the p34(cdc2) kinase to the essential 
substrates of cyclin B2. 

1 5 The disclosed NOV46 nucleic acid of the invention encoding a G2/mitotic- specific 

cyclin B2-like protein includes the nucleic acid whose sequence is provided in Table 46A or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 46A while still encoding a 
protein that maintains its G2/mitotic-specific cyclin B2-like activities and physiological 

20 functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 

25 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 9 percent of the bases may be so changed. 
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The disclosed NOV46 protein of the invention includes the G2/mitotic-specific cyclin 
B2-like protein whose sequence is provided in Table 46B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 46B while still encoding a protein that maintains its G2/mitotic- 
5 specific cyclin B2-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 21 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this G2/mitotic-specific 
10 cyclin B2-like protein (NOV46) may function as a member of a "G2/mitotic-specific cyclin 
B2 family". Therefore, the NOV46 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 
1 5 (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 

marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV46 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
20 and disorders as indicated below. For example, a cDNA encoding the G2/mitotic-specific 
cyclin B2~like protein (NOV46) may be useful in gene therapy, and the G2/mitotic-specific 
cyclin B2-like protein (NOV46) may be useful when administered to a subject in need thereof. 
By way of nonlimiting example, the compositions of the present invention will have efficacy 
for treatment of patients suffering from Cardiomyopathy, Atherosclerosis,Hypertension, 
25 Congenital heart defects, Aortic stenosis ,Atrial septal defect (ASD),Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases,Tuberous sclerosis, Scleroderma, Obesity,Transplantation, 
Adrenoleukodystrophy , Congenital Adrenal Hyperplasia, Hemophilia, 
Hypercoagulation,Idiopathic thrombocytopenic purpura , Immunodeficiencies,Graft vesus 
30 host, Von Hippel-Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy,Lesch- 
Nyhan syndrome, Multiple sclerosis,Ataxia-telangiectasia,Leukodystrophies,Behavioral 
disorders, Addiction, Anxiety, Pain, Neuroprotection, or other pathologies or conditions. The 
NOV46 nucleic acid encoding the G2/mitotic-specific cyclin B2-like protein of the invention, 
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or fragments thereof, may farther be useful in diagnostic applications, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. 

NOV46 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV46 substances for use in 
5 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV46 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
1 0 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV47 

A disclosed NOV47 nucleic acid of 15645 nucleotides (also referred to as CG57764- 
01) encoding a ALR-like protein is shown in Table 47A. The start and stop codons are in bold 
15 letters. 
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Table 47A. NOV47 nucleotide sequence (SEQ ID NOrlll). 

ATGTCCCCTCCACCTGAAGAGTCACCCATGTCTCCACCACCGGAGGCATCTCGTCTGTTC 
CCACCATTTGAAGAGTCTCCTCTGTCCCCTCCACCTGAGGAGTCTCCCCTTTCCCCACCA 
CCTGAGGCATCACGCCTGTCCCCACCACCTGAGGACTCGCCTATGTCCCCACCACCTGAA 
GAATCACCTATGTCCCCCCCACCTGAGGTATCGCGCCTATCCCCCCTGCCTGTGGTGTCA 
CGCCTGTCTCCACCGCCTGAGGAATCTCCCTTGTCCCCACCGCCTGAGGAGTCTCCCACG 
TCCCCTCCACCTGAGGCTTCACGCCTCTCCCCACCACCTGAGGACTCCCCCACATCCCCA 
CCACCTGAGGACTCACCTGCTTCCCCACCACCGGAGGACTCGCTCATGTCCCTGCCGCTG 
GAGGAGTCACCCCTGTTGCCACTACCTGAGGAGCCGCAACTCTGCCCCCGGTCCGAGGGG 
CCGCACCTGTCACCCCGGCCTGAGGAGCCGCACCTGTCCCCCCGGCCTGAGGAGCCACAC 
CTAT CT C CG CAGG CTGAGGAG C C ACAC CTGTC C C C CC AG CCTGAGGAGCCATG C C T ATGC 
GCTGTGCCTGAGGAGCCACACTTGTCCCCCCAGGCTGAGGGACCACATCTGTCCCCTCAG 
CCTGAGGAATTGCACCTGTCCCCCCAGACTGAGGAGCCGCACCTGTCTCCTGTGCCTGAG 
GAGCCATGCTTGTCCCCCCAACCTGAGGAATCACACCTGTCCCCCCAGTCTGAGGAGCCA 
TGCCTGTCCCCCCGGCCTGAGGAATCGCATCTGTCCCCTGAGCTTGAGAAGCCACCCCTG 
TCCCCTCGGCCTGAAAAGCCCCCTGAGGAGCCAGGCCAATGCCCTGCACCTGAGGAGCTG 
CCCTTGTTCCCTCCCCCTGGGGAACCATCCTTATCTCCCTTGCTTGGAGAGCCAGCCCTG 
TCTGAGCCTGGGGAACCACCTCTGTCCCCTCTGCCCGAGGAGCTGCCGTTGTCCCCATCT 
GGGGAGCCATCCTTGTCGCCTCAGCTGATGCCACCAGATCCCCTTCCTCCTCCACTCTCA 
CCCATTATCACAGCTGCGGCCCCACCGGCCCTGTCTCCTTTGGGGGAGTTAGAGTACCCC 
TTTGGTGCCAAAGGGGACAGTGACCCTGAGTCACCGTTGGCTGCCCCCATCCTGGAGACA 
CCCATCAGCCCTCCACCAGAAGCTAACTGCACTGACCCTGAGCCTGTCCCCCCTATGATC 
CTTCCCCCATCTCCAGGCTCCCCAGTGGGGCCGGCTTCTCCCATCCTGATGGAGCCCCTT 
CCTCCTCAGTGTTCGCCACTCCTTCAGCATTCCCTGGTTCCCCAAAACTCCCCTCCTTCC 
CAGTGCTCTCCTCCTGCCCTACCACTGTCCGTTCCCTCCCCGTTGAGTCCCATAGGGAAG 
GTAGTGGGGGTCTCAGATGAGGCTGAGCTGCACGAGATGGAGACTGAGAAAGTTTCAGAA 
CCTGAATGCCCAGCCTTGGAACCCAGTGCCACCAGTCCTCTCCCTTCCCCAATGGGGGAC 
CTTTCCTGCCCCGCCCCCAGCCCTGCCCCAGCCCTGGATGACTTCTCTGGCCTAGGGGAA 
GACACAGCCCCTCTGGATGGGATTGATGCTCCGGGTTCACAGCCAGAGCCTGGACAGACC 
CCTGGCAGTTTGGCTAGTGAACTTAAAGGCTCCCCTGTGCTCCTGGACCCCGAGGAGCTG 
GCCCCTGTGACCCCTATGGAGGTCTACCCCGAATGCAAGCAGACAGCAGGGCGGGGCTCA 
CCATGTGAAGAACAGGAAGAGCCACGTGCACCGGTGGCCCCCACACCACCCACTCTCATC 
AAATCCGACATCGTTAACGAGATCTCTAATCTGAGCCAGGGTGATGCCAGTGCCAGTTTT 
CCTGGCTCAGAGCCCCTCCTGGGCTCTCCAGACCCGGAGGGGGGTGGCTCCCTGTCCATG 

263 



" ISUd* & TLtf TW> U«.i fl^Jt 



GAGTTGGGGGTCTCTACGGATGTTAGTCCAGCCCGAGATGAGGGCTCCCTACGGCTCTGT 
ACTGACTCACTGCCAGAGACTGATGACTCACTATTGTGCGATGCTGGGACAGCTATCAGC 
GGAGGCAAAGCTGAGGGGGAGAAGGGGCGGCGGCGCAGCTCCCCAGCCCGTTCCCGCATC 
AAACAGGGTCGCAGCAGCAGTTTCCCAGGAAGACGCCGGCCTCGTGGAGGAGCCCATGGA 
GGGCGTGGTAGAGGACGGGCCCGGCTAAAGTCAACTGCTTCTTCCATTGAGACTCTGGTA 
GTTGCTGACATTGATAGCTCTCCCAGTAAGGAGGAGGAGGAAGAAGATGATGACACCATG 
CAGAATACCGTGGTTCTCTTCTCCAACACAGACAAATTTGTCCTAATGCAGGACATGTGT 
GTGGTATGTGGCAGCTTTGGCCGGGGGGCAGAGGGCCACCTCCTTGCCTGTTCGCAGTGC 
TCTCAGTGCTATCACCCTTACTGTGTCAACAGCAAGATCACCAAGGTGATGCTGCTCAAG 
GGCTGGCGTTGTGTGGAGTGTATTGTGTGTGAGGTGTGTGGCCAGGCCTCCGACCCCTCA 
CGCCTGCTGCTCTGTGATGACTGTGATATTAGCTACCACACATACTGCCTGGACCCCCCA 
CTGCTCACCGTCCCCAAGGGCGGCTGGAAGTGCAAGTGGTGTGTGTCCTGTATGCAGTGT 
GGGGCTGCTTCCCCTGGCTTCCACTGTGAATGGCAGAATAGTTACACACACTGTGGGCCC 
TGTGCCAGCCTGGTGACCTGCCCTATCTGTCATGCTCCTTACGTAGAAGAGGACCTACTA 
ATCCAGTGCCGCCACTGTGAACGGTGGATGCATGCAGGCTGTGAGAGCCTCTTCACAGAG 
GACGATGTGGACCACGCACCCGATGAAGGCTTTGACTGTGTCTCCTGCCAGCCCTACGTG 
GTAAAGCCTGTGGCGCCTGTTGCACCTCCAGAGCTGGTGCCCATGAAGGTGAAAGAGCCA 
GAGCCCCAGTACTTTCGCTTCGAAGGCGTGTGGCTGACAGAAACTGGCATGGCCTTGCTG 
CGTAACCTGACCATGTCACCACTGCACAAGCGGCGCCAACGGCGAGGACGGCTTGGCCTC 
CCAGGCGAGGCAGGATTGGAGGGTTCTGAGCCCTCAGATGCCCTTGGCCCTGATGACAAG 
AAGGATGGGGAC CTGGACAC CGATGAGCTGCTCAAGGGTGAAGGTGGTGTGGAGCACATG 
GAGTGCGAAATTAAACTGGAGGGCCCCGTCAGCCCTGATGTGGAGCCTGGCAAAGAGGAG 
ACCGAGGAAAGCAAAAAACGCAAGCGTAAACCATATCGGCCTGGCATTGGTGGTTTCATG 
GTGCGACAGCGGAAATCCCACACACGCACGAAAAAGGGGCCTGCTGCACAGGCGGAGGTG 
T TGAG TGGGGATGGG CAG C C CG ACGAGG TGATACC TG CTGAC CTG CC TG CAGAGGG CGC C 
GTGG AGC AG AG C TT AG C TG AAGGGG ATG AG AAGAAG AAG CAACAG CGGCG AGGG CG CAAG 
AGGAGCAAACTGGAGGGCATGTTCCCTGCTTACTTGCAGGAAGCCTTCTTTGGGAAGGAG 
CTGCTGGACCTGAGCCGTAAGGCCCTTTTTGCAGTTGGGGTGGGCCGGCCAAGCTTTGGA 
C TAGGG AC CC C AAAAG C CAAGGG AG ATGG AGG CTC AGAAAGGAAGGAACTC C CCACATCG 
CAGAAAGGAGATGATGGTCCAGATATTGCAGATGAAGAATCCCGTGGCCTCGAGGGCAAA 
GCCGATACACCAGGACCTGAGGATGGGGGCGTGAAGGCATCCCCAGTGCCCAGTGACCCT 
GAGAAGCCAGGCACCCCAGGTGAAGGGATGCTTAGCTCTGACTTAGACAGGATTTCCACA 
GAAGAACTGCCCAAGATGGAATCCAAGGACCTGCAGCAGCTCTTCAAGGATGTTCTGGGC 
TCTGAACGAGAACAGCATCTGGGTTGTGGAACCCCTGGCCTAGAAGGCAGCCGTACGCCA 
CTGCAGAGGCCCTTTCTTCAAGGTGGACTCCCTTTGGGCAATCTGCCCTCCAGCAGCCCA 
ATGGACTCCTACCCAGGCCTCTGCCAGTCCCCGTTCCTGGATTCTAGGGAGCGCGGGGGC 
TTCTTTAGCCCGGAACCCGGTGAGCCCGACAGCCCCTGGACGGGCTCAGGTGGCACCACG 
CCCTCCACCCCCACAACCCCCACCACGGAGGGTGAGGGCGACGGACTCTCCTATAACCAG 
CGGAGTCTTCAGCGCTGGGAGAAGGATGAGGAGTTGGGCCAGCTGTCCACCATCTCGCCT 
GTGCTCTATGCCAACATTAATTTTCCTAATCTCAAGCAAGACTACCCAGACTGGTCAAGC 
CGTTGCAAACAAATCATGAAGCTCTGGAGAAAGGTTCraGCAGCTGACAAAGCCCCCTAC 
CTGCAAAAGGCCAAAGATAACCGGGCAGCTCACCGCATCAACAAGGTGCAGAAGCAGGCT 
GAGAG CCAGATCAACAAG CAGACCAAGGTGGG CGACATAG C C CGTAAG ACTGAC CGACCG 
GCCCTACATCTCCGCATTCCCCCGCAGCCAGGGGCACTGGGCAGCCCGCCCCCCGCTGCT 
GCCCCCACCATTTTCATTGGCAGCCCCACTACCCCCGCCGGCTTGTCTACCTCTGCGGAC 
GGGTTCCTGAAGCCGCCGGCGGGCTCGGTGCCTGGCCCTGACTCGCCTGGTGAGCTCTTC 
CTCAAGCTCCCACCCCAGGTGCCCGCCCAAGCGCCTTCGCAGGACCCCTTTGGACTGGCC 
CCTGCCTATCCCCTGGAGCCCCGCTTCCCCACGGCACCGCCCACCTATCCCCCCTATCCT 
AGTCCTACGGGGGCCCCTGCGCAGCCCCCGATGCTGGGCGCCTCATCTCGTCCTGGGGCT 
GGCCAGCCAGGGGAATTCCACACTACCCCACCTGGCACCCCCAGACACCAGCCCTCCACA 
CCTGACCCGTTCCTCAAACCCCGCTGCCCCTCGCTGGATAACTTGGCTGTGCCTGAGAGC 
CCTGGGGTAGGGGGAGGCAAAGCTTCCGAGCCCCTGCTCTCGCCCCCACCTTTTGGGGAG 
TCCCGGAAGGCCCTAGAGGTGAAGAAGGAAGAGCTTGGGGCATCCTCTCCTAGCTATGGG 
CCCCCAAACCTGGGCTTTGTTGACTCACCCTCCTCAGGCACCCACCTGGGTGGCCTGGAG 
TTAAAGACACCTGATGTCTTCAAAGCCCCCCTGACCCCTCGGGCATCTCAGGTAGAGCCC 
CAGAGCCCGGGCTTGGGCCTAAGGCCCCAGGAGCCACCCCCTGCCCAGGCTTTGGCACCT 
TCTCCTCCAAGTCACCCAGACATCTTTCGCCCTGGCTCCTACACTGACCCATATGCTCAG 
CCCCCATTGACTCCTCGGCCCCAACCTCCGCCCCCTGAGAGCTGCTGTGCTCTGCCCCCT 
CGCTCACTGCCCTCCGACCCTTTCTCCCGAGTGCCTGTCAGTCCTCAGTCCCAGTCCAGC 
TCCCAGTCTCCACTGACACCCCGGCCTCTGTCTGCTGAAGCTTTTTGCCCATCACCCGTT 
ACCCCTCGCTTCCAGTCCCCTGACCCTTATTCTCGCCCACCCTCACGCCCTCAGTCCCGT 
GACCCATTTGCCCCATTGCATAAGCCACCCCGACCCCAGCCCCCTGAAGTTGCCTTTAAG 
GCTGGGTCTCTAGCCCACACTTCGCTGGGGGCTGGGGGGTTCCCAGCAGCCCTGCCCGCG 
GGGCCAGCAGGTGAGCTCCATGCCAAGGTCCCAAGTGGGCAGCCCCCCAATTTTGTCCGG 
TCCCCTGGGACGGGTGCATTTGTGGGCACCCCCTCTCCCATGCGTTTCACTTTCCCTCAG 
GCAGTAGGGGAGCCTTCCCTAAAGCCCCCTGTCCCTCAGCCTGGTCTCCCGCCACCCCAT 
GGGATO^CAGCCATTTTGGGCCCGGCCCCACCTTGGGCAAGCCTCAAAGCACAAACTAC 
ACAGTAGCCACAGGGAACTTCCACCCATCGGGCAGCCCCCTGGGGCCCAGCAGCGGGTCC 
ACAGGGGAGAGCTATGGGCTGTCCCCACTACGCCCTCCGTCGGTTCTGCCACCACCTGCA 
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CCCGACGGATCCCTCCCCTACCTGTCCCATGGAGCCTCACAGCGATCAGGCATCACCTCT 
CCTGTCGAAAAGCGAGAAGACCCAGGGACTGGAATGGGTAGCTCTTTGGCGACAGCTGAA 
CTCCCAGGTACCCAGGACCCAGGCATGTCCGGCCTTAGCCAAACAGAGCTGGAGAAGCAA 
CGGCAGCGCCAGCGGCTACGAGAGCTGCTGATTCGGCAGCAGATCCAGCGCAACACCCTG 
CGG CAGG AGAAGG AAACAGCTG CAG CAG CTG CGGGAG CAGTGGGG CCTCCAGG CAGC TGG 
GGTGCTGAGCCCAGCAGCCCTGCCTTTGAGCAGCTGAGTCGAGGCCAGACCCCCTTTGCT 
GGGA CACAGGACAAGAG CAG CC T TG TGGGGTTG CC C C CAAGCAAG CTGAGTGG CC CCAT C 
CTGGGGCCAGGGTCCTTCCCTAGCGATGACCGACTCTCCCGGCCACCTCCACCAGCCACG 
CCTTCCTCTATGGATGTGAACAGCCGGCAACTGGTAGGAGGCTCCCAAGCTTTCTATCAG 
CGAGCACCCTATCCTGGGTCCCTGCCCTTACAGCAGCAACAGCAACAACTGTGGCAGCAA 
CAAC AGGCAACAG CAGCAAC CT C CATG CGATT TG C CATGT CAG CT CG CTT TC CAT CAAC T 
CCTGGACCTGAACTTGGCCGCCAAGCCCTAGGTTCCCCGTTGGCGGGAATTTCCACCCGT 
CTGCCAGGCCCTGGTGAGCCAGTGCCTGGTCCAGCTGGTCCTGCCCAGTTCATTGAGCTG 
CGGCACAATGTACAGAAAGGACTGGGACCTGGGGGCACTCCGTTTCCTGGTCAGGGCCCA 
CCTCAGAGACCCCGTTTTTACCCTGTAAGTGAGGACCCCCACCGACTGGCTCCTGAAGGG 
CTTCGGGGCCTGGCGGTATCAGGTCTTCCCCCACAGAAACCCTCAGCCCCACCGGCCCCT 
G AAT TGAACAACAGT CTT CATC CAACAC C C CACAC CAAGGG T C CTAC C CTGC CAACTGG T 
TTGGAGCTGGTCAACCGGCCCCCGTCGAGCACTGAGCTTGGCCGCCCCAATCCTCTGGCC 
CTGGAAGCTGGGAAGTTGCCCTGTGAGGATCCCGAGCTGGATGACGATTTTGATGCCCAC 
AAGGCCCTAGAGGATGATGAAGAGCTTGCTCACCTGGGTCTGGGTGTGGATGTGGCCAAG 
GGTGATG ATGAACTTGGCACCTT AGAAAAC CTGGAGAC C AATGAC CC C C ACTTGG ATGAC 
CTGCTCAATGGAGACGAGTTTGACCTGCTGGCATATACTGATCCTGAGCTGGACACTGGG 
GACAAGAAGGATATCTTCAATGAGCACCTGAGGCTGGTAGAATCGGCTAATGAGGAGGCT 
GAACGGGAGGCCCTGCTGCGGGGGGTGGAGCCAGGACCCTTGGGCCCTGAGGAGCGCCCT 
CCCCCTGCTGCTGATGCCTCTGAACCCCGCCTGGCATCTGTGCTCCCTGAGGTGAAGCCC 
AAGGTGGAGGAGGGTGGACGCCACCCTTCTCCTTGCCAATTCACCATTGCTACCCCCAAG 
GTAGAGCCCGCACCTGCTGCCAATTCCCTTGGCCTGGGGCTAAAGCCAGGACAGAGCATG 
ATGGGCAGCCGGGATACCCGGATGGGCACAGGGCCATTTTCTAGCAGTGGGCACACAGCT 
GAGAAGGCCTCCTTTGGGGCCACGGGAGGGCCACCAGCTCACCTGCTGACCCCCAGCCCA 
CTGAGTGGCCCAGGAGGATCCTCCCTGCTGGAAAAGTTTGAGCTCGAGAGTGGGGCTTTG 
AC CT TG CCTGGTGG ACC TGCAG CAT CTGGGGATG AG CTAGACAAGATGGAGAG CT CACTG 
G TAG CCAG CGAGTTACC CCTG C T CATTG AGGAC CTGT TGGAG CATGAGAAGAAGG AG CTG 
C AGAAGAAG C AGCAG CTT TC AG CACAGTTG CAG C CTG C CC AG CAG CAG CAGC AACAG CAG 
CAGCAGCATTCCCTACTGCCTGCACCAGGCCCTGCCCAGGCCATGTCTTTGCCACATGAG 
GGCTCTTCTCCCAGTTTGGCTGGGTCCCAACAGCAGCTTTCCCTGGGTCTTGCAGTTGCC 
CGACAGCCAGGTTTGCCCCAGCCACTGATGCCCACCCAGCCACCAGCTCATGCCCTCCAG 
CAACGC CTGGCTC CATC CATGG CTATGGTGT C CAAT CAAGGG CATATG CTAAGTGGG CAG 
CATGGAGGG CAGG CAGG CTTGG TAC C C CAG CAGAG CT CACAG CCAGTG CTAT CACAGAAG 
C C C ATGGG CAC CATG CCAC CTT CC ATG TGCATG AAG C CG CAG C AATTGGCAATG CAG CAG 
CAG CTGG CAAACAGCTT CTT C C CAGATACAGAC CTGGACAAATTTGCTGC AGAAG AT AT C 
ATTGGTCCCATTGCAAAGGCCAAGATGGTGGCTTTGAAAGGCATCAAGAAAGTGATGGCT 
CAGGGCAGCATTGGGGTGGCACCTGGTATGAACAGACAGCAAGTGTCTCTGCTAGCCCAG 
AGGCTCTCGGGGGGACCTAGCAGTGATCTGCAGAACCATGTGGCAGCTGGGAGTGGCCAG 
GAGCGGAGTGCTGGTGATCCCTCCCAGCCTCGTCCCAACCCGCCCACTTTTGCTCAGGGA 
GTGATCAATGAAGCTGACCAGCGGCAGTATGAGGAGTGGCTGTTCCATACCCAGCAGCTC 
CTACAGATG CAGCTGAAGGTGCTAGAGGAG C AGATTGGTGT ACAC CG CAAGT C C CGGAAG 
GCTCTGTGTGCCAAGCAGCGCACTGCCAAAAAAGCTGGCCGTGAGTTCCCAGAAGCTGAT 
GCTG AGAAGCTCAAGCTGGTTACAGAGCAG CAGAGCAAGATCCAGAAACAACTGGAT CAG 
GT CCGGAAAC AG CAGAAGGAGCACACT AAT CT CATGG CAG AAT AT CGGAAC AAG CAG CAG 
CAACAACAG CAGCAG CAG CAG CAACAACAG CAACAG CACT CAG CTGTG C TGG CT CTCAG C 
CCTTCCCAGAGTCCCCGGCTGCTCACCAAGCTCCCTGGTCAGCTGCTCCCTGGCCATGGG 
CTGCAGCCACCACAGGGGCCTCCGGGTGGGCAAGCCGGAGGTCTTCGCCTGACCCCTGGG 
GGTATGGCACTACCTGGACAGCCTGGTGGCCCCTTCCTTAATACAGCTCTGGCCCAACAG 
CAGCAACAGCAACATTCTGGTGGGGCTGGATCCCTGGCTGGCCCTTCAGGGGGCTTCTTC 
CCTGGCAACCTTGCTCTTCGAAGCCTCGGACCTGATTCAAGGCTTTTACAGGAAAGGCAG 
CTGCAG CTGCAGCAG CAACGTATGCAGCTGGCCCAGAAACTGCAGCAGCAGCAGCAG CAG 
CAACAGCAGCAGCAGCACCTTCTAGGACAGGTGGCAATCCAGCAGCAACAGCAGCAGGGT 
CCTGGAGTACAGACAAACCAAGCTCTGGGTCCCAAGCCCCAGGGCCTTATGCCTCCCAGC 
AGCCACCAAGGCCTCCTGGTCCAGCAGCTGTCCCCTCAACCACCCCAGGGGCCCCAGGGC 
ATGCTGGGCCCTGCCCAGGTGGCTGTGTTGCAGCAGCAGCACCCTGGAGCTTTGGGCCCC 
CAGGGCCCTCACAGACAGGTGCTTATGACCCAGTCCCGGGTGCTCAGTTCCCCCCAGCTG 
G CACAG CAGGGTCAGGG CCTTATGGGACACAGGC TGG TCACAG C C CAG CAG CAG CAG CAG 
C AACAAC AGCAC CAACAG C AAGGGT C CATGG CAGGG C TG T C CC ATCT TC AG C AAAG T CTG 
ATGTCACACAGTGGGCAGCCCAAACTGAGCGCTCAGCCCATGGGCTCTTTACAGCAGCTT 
CAGCAGCAGCAGCAGCTGCAACAGCAACAGCAACTTCAGCAGCAGCAGCAGCAGCAGCTA 
CAACAG CAACAG CAAC T T CAG CAG CAACAG CT T CAACAG CAG CAACAG CAG CAG CAG CTT 
CAACAACAG CAG CAG CAACAGCTT CAACAG CAGCAACAGCAGCT ACAACAG CAACAG CAA 
CAACAACAGCAG CAG TT TCAACAG CAG CAG CAACAG CAG CAGATGGG C C TT T TAAAC CAG 
AGTCGAACTTTACTGTCCCCTCAGCAACAACAGCAGCAGCAAGTGGCACTTGGCCCTGGC 
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ATGCCAGCAAAGCCTCTTCAACACTTTTCTAGCCCTGGAGCCCTGGGTCCAACCCTCCTC 
CTGACGGGCAAGGAACAAAACACCGTAGACCCAGCCGTTTCTTCAGAGGCCACTGAGGGG 
CCCTCTACACATCAGGGAGGGCCGTTAGCAATAGGAACTACCCCTGAGTCAATGGCCACT 
GAACCAGGAGAGGTAAAGCCCTCACTCTCTGGGGACTCACAACTCCTGCTTGTCCAACCC 
CAGCCCCAGCCTCAGCCCAGCTCTCTGCAGCTGCAGCCACCTCTGAGGCTTCCAGGACAA 
CAGCAGCAGCAAGTTAGCCTGCTCCACACAGCAGGTGGAGGAAGCCATGGGCAGCTAGGC 
AGTGGATCATCTTCTGAGGCCTCATCTGTGCCCCACCTGCTGGCTCAGCCCTCTGTTTCC 
TTAGGGGATCAGCCTGGGTCCATGACCCAGAACCTTCTGGGCCCCCAACAGCCCATGCTA 
GAGCGGCCCATGCAAAATAATACAGGGCCACAACCTCCCAAACCAGGACCTGTCCTCCAG 
TCTGGGCAGGGTCTGCCTGGGGTTGGAATCATGCCTACGGTGGGTCAGCTTCGAGCACAG 
CTCCAAGGAGTCCTGGCCAAAAACCCACAGCTGCGGCACTTAAGTCCTCAGCAGCAGCAG 
CAGCTACAGGCACTCCTCATGCAGCGGCAGCTGCAGCAGAGTCAGGCAGTACGCCAGACC 
CCACCCTACCAGGAGCCTGGGACCCAGACCTCTCCCCTCCAGGGCCTCCTGGGCTGCCAA 
CCTCAACTTGGGGGCTTCCCTGGACCACAGACAGGCCCCCTCCAGGAGCTAGGGGCAGGG 
CCTCGACCTCAGGGCCCACCCCGGCTCCCTGCCCCACCAGGAGCCTTATCTACAGGACCA 
GT C CTTGG CCCTGTC C AT CC CACAC CT C CAC CAT C CAG C C CT C AAGAGC C AAAGAGAC CT 
TCACAATTACCTTCCCCCAGCTCCCAGCTTCCCACTGAGGCCCAGCTCCCTCCCACCCAT 
CCAGGGACCCCCAAACCTCAGGGGCCAACCTTGGAGCCGCCTCCTGGGAGGGTCTCACCT 
GCTGCTGCCCAGCTTGCAGATACCTTGTTTAGCAAGGGTCTGGGACCTTGGGATCCCCCA 
GACAACCTAGCAGAAACCCAGAAGCCAGAGCAGAGCAGCCTGGTACCTGGGCATCTGGAC 
CAGGTGAATGG ACAGG TGGTG C CTG AGG CATC C C AACT C AG CATC AAG CAGG AAC C T CGG 
GAAGAGCCATGTGCCCTGGGAGCCCAGTCAGTGAAGAGGGAGGCCAATGGGGAGCCAATA 
GGGGCACCAGGAACCAGCAACCACCTCCTGCTGGCAGGCCCTCGCTCAGAAGCTGGGCAT 
CTGCTCTTGCAGAAGCTACTCCGGGCAAAGAATGTGCAACTCAGCACTGGGCAGGGGTCC 
GAGGGGCTGCGAGCTGAGATCAACGGGCACATTGACAGCAAGCTGGCTGGGCTGGAGCAG 
AAACTACAGGGTACCCCCAGCAACAAGGAGGATGCAGCAGCAAGGAAGCCTTTGACACCG 
AAGCCCAAGCGGGTACAGAAGGCAAGCGACAGGTTGGTGAGCTCCCGAAAGAAGCTGCGG 
AAGGAGGACGGCGTCAGGGCCAGCGAGGCCTTGCTGAAACAGCTGAAACAGGAGCTGTCC 
CTGCTGCCCCTAACGGAGCCTGCTATCACCGCCAATTTTAGCCTCTTTGCCCCCTTTGGC 
AGTGGCTGCCCAGTCAATGGGCAGAGCCAGCTGAGGGGGGCCTTTGGAAGTGGGGCGCTG 
CCCACTGGCCCTGACTACTATTCCCAGCTGCTTACCAAGAATAACCTGAGTAACCCGCCG 
ACACCACCCTCGTCGCTGCCCCCCACCCCACCCCCATCGGTGCAGCAGAAGATGGTGAAT 
GGCGTCACCCCATCTGAAGAGCTGGGGGAGCACCCCAAGGATGCTGCCTCTGCCCGGGAT 
AGTGAAAGGGCACTGAGGGATACTTCAGAGGTGAAGAGTCTAGACCTGCTGGCTGCCTTG 
C C T ACAC C C C C T CAC AAT CAG AC TG AGGATGT C AGG ATGGAG AGTG ATGAGG AT AG CG AT 
TCTCCTGACAGCATTGTGCCAGCTTCATCCCCTGAGAGCATCTTGGGGGAGGAGGCCCCT 
CGTTTCCCTCATCTGGGCTCAGGCCGGTGGGAGCAAGAGGACCGGGCCCTCTCCCCTGTC 
ATCCCCCTCATTCCTCGGGACAGCATCCCAGTCTTCCCAGATACCAAACCTTATGGGGCC 
CTTGGCCTGGAGGTCCCTGGAAAGCTGCCTGTCACAACTTGGGAAAAGGGCAAAGGAAGT 
GAGGTGTCAGTCATGCTCACAGTCTCTGCTGCTGCAGACAAGAACCTGAATGGCGTGATG 
GTGGCAGTGGCGGAGCTGCTGAGCATGAAGATCCCCAACTCCTATGAGGTGCTGTTCCCA 
GAGAGCCCCGCCCGGGGAGGCACTGAGCCAAAGAAGGGGGAAGCTGAGGGTCCTGGTGGG 
AAGGAAAAGGGTCTGGAAGGCAAGAGCCCAGACACTGGCCCTGATTGGCTGAAGCAGTTT 
GATGCAGTGTTGGCTGGCTATACCCTGAAGAGGCAACTAGACATCTTGAGCCTCCTGAAA 
CAGGAGAGCCCCGCCCCAGAGCCACCCACTCAGCACAGGTATACCTACAATGTCTCCAAT 
CTGGATGTGCGACAGCTCTCGGCCCCACCTCCTGAAGAACCCTCCCCGCCCCCTTCCCCC 
TTGGCACCTTCTCCTGCCAGTCCCCCTACTGAGCCCTTGGTTGAACTTCCCACCGAACCC 
TTGGCTGAGCCACCCGTCCCCTCACCTCTGCCACTGGCCTCATCCCCTGAATCAGCCCGA 
CCCAAGCCCCGTGCCCGGCCCCCTGAAGAAGGTGAAGATACCCGTCCTCCTCGCCTCAAG 
AAATGGAAAGGAGTGCGCTGGAAGCGGCGCTTACGAGGTGCCATGTTGGAGCTTTTTGGT 
GTG AACAGTCTGGAAGTAAAATTTAGGAC CAGAAG CGAGAATGG CGTTTTAATC CATAT C 
CAAG AAAG CAG CAATT AC AC T ACTG TGAAG ATTAAGAATGG CAAAGT ATATTT TAC ATCC 
GATG CAGGAATTG CTGGGAAAGTGGAGAGAAATATTCCTGAAGTATATGT TG CAGACGG C 
CACTGGCACACTTTTCTAATTGGGAAAAATGGAACAGCAACAGTATTGTCTGTTGACAGA 
ATATATAACAGAGATATTAT CCAC C CTACTCAGGACTT CGGTGG CC TTG ATGTG CTTACT 
ATATCACTTGGAGGAATTCCACCCAATCAAGCACATCGAGATGCCCAAACAGCAGGTTTT 
GATGGCTGCATTGCTTCTATGTGGTATGGTGGAGAAAGTCTTCCTTTCAGCGGGAAGCAT 
AGCTTGGCCTCCATCTCAAAAACAGATCCCTCAGTGAAGATTGGCTGCCGTGGCCCGAAC 
ATTTGTGCCAGCAACCCCTGCTGGGGTGATTTGCTGTGCATTAATCAGTGGTATGCCTAC 
AGGTGTGTCCCTCCTGGGGACTGTGCCTCCCACCCGTGCCAGAATGGTGGCAGCTGTGAG 
CCAGGCCTGCACTCCGGCTTCACCTGTAGCTGCCCAGACTCGCACACGGGAAGGACCTGT 
GAGATGGTGGTGGCCTGTCTTGGCGTCCTCTGTCCTCAGGGGAAGGTGTGCAAAGCTGGA 
AGTCCTGCGGGGCATGTCTGTGTTCTGAGTCAGGGCCCTGAAGAGATCTCTCTGCCTTTG 
TGGGCTGTGCCTGCCATCGTGGGCAGCTGCGCAACCGTCTTGGCCCTCCTGGTCCTTAGC 
CTGATCCTGTGTAACCAGTGCAGGGGGAAGAAGGCCAAAAATCCCAAAGAGGAGAAGAAA 
CCGAAGGAGAAGAAGAAAAAGGGAAGTGAGAACGTTGCTTTTGATGACCCTGACAATATC 
CCTCCCTATGGGGATGACATGACTGTGAGGAAGCAGCCTGAAGGGAACCCAAAACCAGAT 
ATCATTGAAAGGGAAAAC C C CT AC CTTAT C T ATGATGAAAC TGAT AT TC C T C ACAACTCA 
GAAACCATCCCCAGCGCCCCTTTGGCATCTCCAGAGCAGGAGATAGAGCACTATGACATT 
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GACAACGCCAGCAGCATCGCCCCTTCGGATGCAGACATCATTCAACACTACAAGCAGTTC 
CG CAG CCACACACCAAAATTTTCAATC CAGAGGCACAGTCC CCTAGGCTTTG CAAGGCAA 
TCCCCCATGCCCTTAGGAGCAAGCAGTTTGACTTACCAGCCTTGATATGGTCAAGGTTTG 
AGAACCAGCTCCCTAAGCCACTCAGCATGCCCAACTCCCAACCCTCTGTCTCGACACAGT 
CCAGCCCCTTTCTCCAAATCTTCTACGTTCTATAGAAACAGCCCAGCAAGGGAATTGCAT 
CTTCCTATAAGGGATGGTAATACTTTGGAAATGCATGGTGACACCTGCCAACCTGGCATT 
T T CAACTATG CCACAAGGCTGGGAAGG AGAAG C AAGAGTC CT CAGGC CATGGC AT C ACAT 
GGTT CTAGACCAGGGAGTCG C CTAAAG CAG C CGATTGGG CAG ATT CCACTGGAAT CTT CT 
CCTCGAGTCGGACTTTCTATTGAAGAAGTGGAGAGGCTCAACACACCTCGCCCTAGAAAC 
C CAAG TATC TGCAGTG CAGAC CATGGGAGGT CT TC TT CAGAGG AGGAC TG CAG AAGGC CA 
CTGTCTAGAACAAGGAATCCAGCGGATGGCATTCCAGCTCCAGAATCCTCTTCTGATAGT 
GACT C C CATGAAT CT T T C ACTTG C T C AGAAATGG AAT ATGACAGGGAGAAGC C AATGGT A 
TATACTTC CAGAATG C C CAAATTATCT CAAGT CAATGAAT CTGATGC AGATGATGAAG AT 
AATTATGGAGCCAGACTGAAGCCTCGAAGGTACCACGGTCGCAGGGCCGAGGGAGGACCT 
GTGGGCACCCAGG(^GCAGCACCAGGCACTGCTGACAACACACTGCCCATGAAGCTAGGG 
CAGCAAGCAGGGACTTTCAACTGGGACAACCTTTTGAACTGGGGCCCTGGCTTTGGCCAT 
TATGTAGATGTTTTTAAAGATTTGGCATCTCTTCCAGAAAAAGCAGCAGCAAATGAAGAA 
GGCAAAGCTGGGACAACTAAACCAGTCCCCAAAGATGGGGAAGCAGAACAGTATGTGTGA 
AGTTTATGTACTGGCACTATAAAATATAAAAACAAGAAATAATACTTCAAACCATTGTAA 
AGTTGCTGACTAGGTTGGGGTCACATTTGAAAAACAGGGCCAGTATGGGACTAGTGGGTG 
GGAGGGGAAAACTTTTAAAATTATTAACCACATTGCTGGCTGAAA 
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In a search of public sequence databases, the NOV47 nucleic acid sequence, located 
on chromsome 12ql2.14 has 14307 of 14312 bases (99%) identical to a gb:GENBANK- 
ID:AF010404|acc:AF010404.1 mRNA from Homo sapiens (Homo sapiens ALR mRNA, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV47 polypeptide (SEQ ID NO: 1 1 2) encoded by SEQ ID NO: 1 1 1 
has 5159 amino acid residues and is presented in Table 47B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV47 has no signal peptide and 
is likely to be localized in the cytoplasm with a certainty of 0.9800. 



Table 47B. Encoded NOV47 protein sequence (SEQ ID NO: 112). 



MSPPPEESPMSPPPEASRLFPPFEESPLSPPPEESPLSPPPEASRLSPPPEDSPMSPPPE 
ESPMSPPPEVSRLSPLPWSRLSPPPEESPLSPPPEESPTSPPPEASRLSPPPEDSPTSP 
PPEDSPASPPPEDSLMSLPLEESPLLPLPEEPQLCPRSEGPHLSPRPEEPHLSPRPEEPH 
LSPQAEEPHLSPQPEEPCLCAVPEEPHLSPQAEGPHLSPQPEELHLSPQTEEPHLSPVPE 
EPCLSPQPEESHLSPQSEEPCLSPRPEESHLSPELEKPPLSPRPEKPPEEPGQCPAPEEL 
PLFPPPGEPSLSPLLGEPALSEPGEPPLSPLPEELPLSPSGEPSLSPQLMPPDPLPPPLS 
PIITAAAPPALSPLGELEYPFGAKGDSDPESPLAAPILETPISPPPEANCTDPEPVPPMI 
LPPSPGSPVGPASPILMEPLPPQCSPLLQHSLVPQNSPPSQCSPPALPLSVPSPLSPIGK 
WGV S DE AE LHE M E TE KVS E P E C PAL E P SAT S P L P S P MGDL S C PAP S P AP ALDD FSGLGE 
DTAPLDGIDAPGSQPEPGQTPGSLASELKGSPVLLDPEELAPVTPMEVYPECKQTAGRGS 
PCEEQEEPRAPVAPTPPTL I KS DI VNE I SNLSQGDAS AS FPGSEPLLGS PDPEGGGS LSM 
ELGVSTDVSPARDEGSLRLCTDSLPETDDSLLCDAGTAISGGKAEGEKGRRRSSPARSRI 
KQGRS S S F P G RRR P RGGAHGG RGRGRALR KS TAS S I E TL WAD I D S S P S KE EEEEDDDTM 
QNTWLFSNTDKFVLMQDMCWCGSFGRGAEGHLLACSQCSQCYHPYCVNSKITKVMLLK 
GWRCVECIVCEVCGQASDPSRLLLCDDCDISYHTYCLDPPLLTVPKGGWKCKWCVSCMQC 
G AAS PG FH C E WQNS YTH CG P CAS L VT C P I CHAP Y VE E D LL I Q CRH C ER WMHAG C ES L FTE 
DDVDHAPDEGFDCVSCQPYWKPVAPVAPPELVPMKVKEPEPQYFRFEGVWLTETGMALL 
RNLTMSPLHKRRQRRGRLGLPGEAGLEGSEPSDALGPDDKKDGDLDTDELLKGEGGVEHM 
ECEIKLEGPVSPDVEPGKEETEESKKRKRKPYRPGIGGFMVRQRKSHTRTKKGPAAQAEV 
LSGDGQPDEVIPADLPAEGAVEQSLAEGDEKKKQQRRGRKRSKLEGMFPAYLQEAFFGKE 
LLDLSRKALFAVGVGRPSFGLGTPKAKGDGGSERKELPTSQKGDDGPDIADEESRGLEGK 
ADTPGPEDGGVKASPVPSDPEKPGTPGEGMLSSDLDRISTEELPKMESKDLQQLFKDVLG 
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SEREQHLGCGTPGLEGSRTPLQRPFLQGGLPLGNLPSSSPMDSYPGLCQSPFLDSRERGG 
FFSPEPGEPDSPWTGSGGTTPSTPTTPTTEGEGDGLSYNQRSLQRWEKDEELGQLSTISP 
VLYAN1NFPNLKQDYPDWSSRCKQIMKLWRKVPAADKAPYLQKAKDNRAAHRINKVQKQA 
ESQINKQTKVGDIARKTDRPALHLRIPPQPGALGSPPPAAAPTIFIGSPTTPAGLSTSAD 
GFLKPPAGSVPGPDSPGELFLKLPPQVPAQAPSQDPFGLAPAYPLEPRFPTAPPTYPPYP 
SPTGAPAQPPMLGASSRPGAGQPGEFHTTPPGTPRHQPSTPDPFLKPRCPSLDNLAVPES 
PG VGGG KAS EPLLSPPP FG E S RKALE VKKEE LGAS S P S YG P PNLG F VDS P S S GTHLGGL E 
LKTPDVFKAPLTPRASQVEPQSPGLGLRPQEPPPAQALAPSPPSHPDIFRPGSYTDPYAQ 
PPLTPRPQPPPPESCCALPPRSLPSDPFSRVPVSPQSQSSSQSPLTPRPLSAEAFCPSPV 
TPRFQSPDPYSRPPSRPQSRDPFAPLHKPPRPQPPEVAFKAGSLAHTSLGAGGFPAALPA 
GPAGELHAKVPSGQPPNFVRSPGTGAFVGTPSPMRFTFPQAVGEPSLKPPVPQPGLPPPH 
GINSHFGPGPTLGKPQSTNYTVATGNFHPSGSPLGPSSGSTGESYGLSPLRPPSVLPPPA 
PDGSLPYLSHGASQRSGITSPVEKREDPGTGMGSSLATAELPGTQDPGMSGLSQTELEKQ 
RQRQRLRELLIRQQIQRNTLRQEKETAAAAAGAVGPPGSWGAEPSSPAFEQLSRGQTPFA 
GTQDKSSLVGLPPSKLSGPILGPGSFPSDDRLSRPPPPATPSSMDVNSRQLVGGSQAFYQ 
RAPYPGSLPLQQQQQQLWQQQQATAATSMRFAMSARFPSTPGPELGRQALGSPLAGISTR 
LPGPGEPVPGPAGPAQFIELRHNVQKGLGPGGTPFPGQGPPQRPRFYPVSEDPHRLAPEG 
LRGLAVSGLPPQKPSAPPAPELNNSLHPTPHTKGPTLPTGLELVNRPPSSTELGRPNPLA 
LEAGKLPCEDPELDDDFDAHKALEDDEELAHLGLGVDVAKGDDELGTLENLETNDPHLDD 
LLNGDEFDLLAYTDPELDTGDKKDIFNEHLRLVESANEEAEREALLRGVEPGPLGPEERP 
PPAADASEPRLASVLPEVKPKVEEGGRHPSPCQFTIATPKVEPAPAANSLGLGLKPGQSM 
MGSRDTRMGTGPFSSSGHTAEKASFGATGGPPAHLLTPSPLSGPGGSSLLEKFELESGAL 
TLPGGPAASGDELDKMESSLVASELPLLIEDLLEHEKKELQKKQQLSAQLQPAQQQQQQQ 
QQHSLLPAPGPAQAMSLPHEGSSPSLAGSQQQLSLGLAVARQPGLPQPLMPTQPPAHALQ 
QRI^PSMAMVSNQGHMLSGQHGGQAGLVPQQSSQPVLSQKPMGTMPPSMCMKPQQLAMQQ 
QLANS FFPDTDLDKFAAEDI IGP I AKAKMVALKGI KKVMAQGS IGVAPGMNRQQVSLLAQ 
RLSGGPSSDLQNHVAAGSGQERSAGDPSQPRPNPPTFAQGVINEADQRQYEEWLFHTQQL 
LQMQLKVLEEQIGVHRKSRKALCAKQRTAKKAGREFPEADAEKLKLVTEQQSKIQKQLDQ 
VRKQQKEHTNLMAEYRNKQQQQQQQQQQQQQQHSAVLALSPSQSPRLLTKLPGQLLPGHG 
LQP PQGPPGGQAGGLRLTPGGMALPGQPGGP FLNT ALAQQQQQQHSGGAGS LAGP SGGFF 
PGNLALRSLGPDSRLLQERQLQLQQQRMQLAQKLQQQQQQQQQQQHLLGQVAIQQQQQQG 
PGVQTNQALGPKPQGLMPPSSHQGLLVQQLSPQPPQGPQGMLGPAQVAVLQQQHPGALGP 
QGPHRQVLMTQSRVLSSPQLAQQGQGLMGHRLVTAQQQQQQQQHQQQGSMAGLSHLQQSL 
MSHSGQPKLSAQPMGSLQQLQQQQQLQQQQQLQQQQQQQLQQQQQLQQQQLQQQQQQQQL 
QQQQQQQLQQQQQQLQQQQQQQQQQFQQQQQQQQMGLLNQSRTLLSPQQQQQQQVALGPG 
MPAKPLQHFSSPGALGPTLLLTGKEQNTVDPAVSSEATEGPSTHQGGPLAIGTTPESMAT 
EPGEVKPSLSGDSQLLLVQPQPQPQPSSLQLQPPLRLPGQQQQQVSLLHTAGGGSHGQLG 
SGSSSEASSVPHLLAQPSVSLGDQPGSMTQNLLGPQQPMLERPMQNNTGPQPPKPGPVLQ 
SGQGLPGVGIMPTVGQLRAQLQGVLAKNPQLRHLSPQQQQQLQALLMQRQLQQSQAVRQT 
PPYQEPGTQTSPLQGLLGCQPQLGGFPGPQTGPLQELGAGPRPQGPPRLPAPPGALSTGP 
VLGPVHPTPPPSSPQEPKRPSQLPSPSSQLPTEAQLPPTHPGTPKPQGPTLEPPPGRVSP 
AAAQLADTLFSKGLGPWDPPDNLAETQKPEQSSLVPGHLDQVNGQWPEASQLSIKQEPR 
EEPCALGAQSVKREANGEPIGAPGTSNHLLLAGPRSEAGHLLLQKLLRAKNVQLSTGQGS 
EGLRAEINGHIDSKLAGLEQKLQGTPSNKEDAAARKPLTPKPKRVQKASDRLVSSRKKLR 
KEDGVRASEAL.LKQLKQELSLLPLTEPAITANFSLFAPFGSGCPVNGQSQLRGAFGSGAL 
PTGPDYYSQLLTKNNLSNPPTPPSSLPPTPPPSVQQKMVNGVTPSEELGEHPKDAASARD 
SERALRDTSEVKSLDLLAALPTPPHNQTEDVRMESDEDSDSPDSIVPASSPESILGEEAP 
RFPHLGSGRWEQEDRALS PVI PL I PRDS I P VFPDTKP YGALGLE VPGKLPVTTWEKGKGS 
EVSVT4LTVSAAADKNLNGVMVAVAELLSMKIPNSYEVLFPESPARGGTEPKKGEAEGPGG 
KEKGLEGKS P DTGPDWLKQFDAVLAG YTLKRQLD I LSLLKQE S PAPE P P TQHR YT YNVSN 
LDVRQLSAPPPEEPSPPPSPLAPSPASPPTEPLVELPTEPLAEPPVPSPLPLASSPESAR 
PKPRARPPEEGEDTRPPRLKKWKGVRWKRRLRGAMLELFGVNSLEVKFRTRSENGVLIHI 
QESSNYTTVKIKNGKVYFTSDAGIAGKVERNIPEVYVADGHWHTFLIGKNGTATVLSVDR 
I YNRD I IHPTQDFGGLDVLT I SLGG I P PNQAHRDAQTAGFDGC I ASMWYGGESLP FSGKH 
SLASISKTDPSVKIGCRGPNICASNPCWGDLLCINQWYAYRCVPPGDCASHPCQNGGSCE 
PGLHSGFTCSCPDSHTGRTCEMWACLGVLCPQGKVCKAGSPAGHVCVLSQGPEEISLPL 
WAVPAIVGSCATVLALLVLSLILCNQCRGKKAKNPKEEKKPKEKKKKGSENVAFDDPDNI 
PPYGDDMTVRKQPEGNPKPDIIERENPYLIYDETDIPHNSETIPSAPLASPEQEIEHYDI 
DNASSIAPSDADIIQHYKQFRSHTPKFSIQRHSPLGFARQSPMPLGASSLTYQPSYGQGL 
RTSSLSHSACPTPNPLSRHSPAPFSKSSTFYRNSPARELHLPIRDGNTLEMHGDTCQPGI 
FNYATRLGRRSKSPQAMASHGSRPGSRLKQPIGQIPLESSPPVGLSIEEVERLNTPRPRN 
PSICSADHGRSSSEEDCRRPLSRTRNPADGIPAPESSSDSDSHESFTCSEMEYDREKPMV 
YTSRMPKLSQVNESDADDEDNYGALRKPRRYHGRRAEGGPVGTQAAAPGTADNTLPMKLG 
QQAGTFNWDNLLNWGPGFGHYVDVFKDLASLPEKAAANEEGKAGTTKPVPKDGEAEQYV 



268 



« isr- in* "zm. 
a— a* iu 

A search of sequence databases reveals that the NOV47 amino acid sequence has 4776 
of 4796 amino acid residues (99%) identical to ? and 4779 of 4796 amino acid residues (99%) 
similar to, the 4957 amino acid residue ptnr:SPTREMBL-ACC:0 14687 protein from Homo 
sapiens (Human) (ALR). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

NOV47 is expressed in at least brain. This information was derived by determining the 
tissue sources of the sequences that were included in the invention including but not limited to 
SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV47 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 47C. 



Table 47C. BLAST results for NOV47 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 7512280 |pir| |T03 


ALR protein - 
human 


4957 


4384/4427 
(99%) 


4385/4427 
(99%) 


0.0 


455 


gi | 18601907 | ref |XP 


mye 1 oi d/ lymphoi d 
or mixed- lineage 
leukemia 2 [Homo 
sapiens] 


3492 


2930/2944 
(99%) 


2933/2944 
(99%) 


0 . 0 


028760 .2 | 
<XM_028760) 


gi ) 3 54 02 81 | gb | AAC34 


All-1 related 
protein [Takifugu 
rubripesj 


4823 


648/1458 
(44%) 


822/1458 
(55%) 


0.0 


383. 1| (AF056116) 


gi|l3640139|ref |XP 

017017.1) 

(XM__017017) 


protein FLJ23056 
[Homo sapiens] 


317 


315/317 
(99%) 


315/317 
(99%) 


e-149 


gi| 1086404l|ref | NP 


myeloid/lymphoid 
or mixed- lineage 
leukemia 3; ALR- 
like protein 
[Homo sapiens] 


4025 


340/802 
(42%) 


467/802 
(57%) 


e-135 


067053 . 1 | 


(NM_021230) 



Tables 47D-H list the domain descriptions from DOMAIN analysis results against 
NOV47. This indicates that the NOV47 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 47D. Domain Analysis of NOV47 

gnl 1 Smart [ smar t00282 / LamG, Laminin G domain 

CD-Length = 135 residues, 98.5% aligned 

Score = 85.1 bits (209), Expect = 9e-17 
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Table 47E. Domain Analysis of NOV47 

gnl 1 Pfam|pf am00628 , PHD, PHD-f inger . PHD folds into an interleaved 
type of Zn- finger chelating 2 Zn ions in a similar manner to that of 
the RING and FYVE domains. 

CD-Length = 49 residues, 98.0% aligned 

Score = 71.2 bits (173), Expect = le-12 



Table 47F. Domain Analysis of NOV47 

gnl | Pf am|pf 311100769 , ERM, Ezrin/radixin/moesin family. This family of 
proteins contain a band 4.1 domain (pfam00373), at their amino 
terminus. This family represents the rest of these proteins. 

CD-Length = 365 residues, 23.0% aligned 

Score = 52.8 bits (125), Expect = 5e-07 



Table 47G. Domain Analysis of NOV47 

gnl 1 Pfam|pf am00529 , HlyD, HlyD family secretion protein. 

CD-Length = 310 residues, 53.9% aligned 

Score = 53.9 bits (128), Expect = 2e-07 



Table 47H. Domain Analysis of NOV47 

gnl 1 Pf am|pf am02l6 6 , Androgen_recep, Androgen receptor. 

CD-Length = 456 residues, 18.6% aligned 

Score =52.8 bits (125), Expect = 5e-07 



The ALL-1 gene is involved in human acute leukemia through chromosome 
translocations or internal rearrangements. ALL-1 is the human homologue of Drosophila 
trithorax. The latter is a member of the trithorax group (trx-G) genes which together with the 
Polycomb group (Pc-G) genes act as positive and negative regulators, respectively, to 
determine the body structure of Drosophila. ALR encodes a gigantic 5262 amino acid long 
protein containing a SET domain, five PHD fingers, potential zinc fingers, and a very long run 
of glutamines interrupted by hydrophobic residues, mostly leucine. The SET motif, PDH 
fingers, zinc fingers and two other regions are most similar to domains of ALL-1 and TRX. 
The first two motifs are also found in other trx-G and Pc-G proteins. The ALR gene was 
mapped to chromosome band 12ql2-13, adjacent to the VDR gene. This region is involved in 
duplications and translocations associated with cancer. 

The human ALL-1/MLL/HRX gene on chromosome 1 lq23 is the site of many locally 
clustered chromosomal alterations associated with several types of acute leukemias, including 
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deletions, partial duplications and reciprocal translocations. Structurally variant proteins 
derived from an altered ALL-1 gene presumably make essential contributions to the malignant 
transformation of hematopoietic progenitor cells. 

Many haematologic malignancies carry characteristic chromosomal translocations, 
5 which are thought to play an important role in the pathogenesis of these tumours. The t(8; 14) 
translocation in Burkitt's lymphoma was one of the first characterized at the molecular level. 
In this translocation the c-myc oncogene at chromosome 8q24 becomes deregulated by 
enhancer elements of the immunoglobulin heavy chain locus at chromosome 14q32 leading to 
a very aggressive B cell malignancy. Chromosomal translocations involving the MLL gene 
1 0 occur in about 80% of infant leukemias. 

The disclosed NOV47 nucleic acid of the invention encoding a ALR-like protein 
includes the nucleic acid whose sequence is provided in Table 47A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 47 A while still encoding a protein that maintains 

1 5 its ALR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 

20 include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 1 percent of the 

25 bases may be so changed. 

The disclosed NOV47 protein of the invention includes the ALR-like protein whose 
sequence is provided in Table 47B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 47B 
while still encoding a protein that maintains its ALR-like activities and physiological 

30 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 1 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2 5 that bind immunospecifically to any of the proteins of the invention. 
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The above defined information for this invention suggests that this ALR-like protein 
(NOV47) may function as a member of a "ALR family". Therefore, the NOV47 nucleic acids 
and proteins identified here may be useful in potential therapeutic applications implicated in 
(but not limited to) various pathologies and disorders as indicated below. The potential 
5 therapeutic applications for this invention include, but are not limited to: protein therapeutic, 
small molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 
research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 
(but not limited to) those defined here. 

10 The NOV47 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the ALR-like protein 
(NOV47) may be useful in gene therapy, and the ALR-like protein (NOV47) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 

1 5 compositions of the present invention will have efficacy for treatment of patients suffering 

from Osteoporosis, involutional; Rickets, vitamin D-resistant; Fibrosis of extraocular muscles, 
congenital, 1 ; Achalasia-addisonianism-alacrimia syndrome; Cataract, polymorphic and 
lamellar; acute leukemias, cancers, or other pathologies or conditions. The NOV47 nucleic 
acid encoding the ALR-like protein of the invention, or fragments thereof, may further be 

20 useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV47 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV47 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

25 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV47 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

30 disorders. 
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NOV48 



A disclosed NOV48 nucleic acid of 1988 nucleotides (also referred to as CG57713- 
01) encoding a sodium/bile acid transporter-like protein is shown in Table 48 A. The start and 
stop codons are in bold letters. 



Table 48 A. NOV48 nucleotide sequence (SEQ ID NO: 113). 



GGCGGCGACTGCGGCG ACCGCG GGACGGCGAGAGGCACGCGGCGGGAGGGGACCGGAATC 
CGCAGCTCCGGCCGCGCCA TGGACGGCAACGACAACGTGACCCTGCTCTTCGCCCCTCTG 
CTGCGGGACAACTACACCCTGGCGCCCAATGCCAGCAGCCTGGGCCCCGGCACGGACCTC 
GCCCTCGCCCCTGCCTCCAGCGCCGGCCCCGGCCCTGGGCTCAGCCTCGGGCCGGGTCCG 
AGCTTCGGCTTCAGCCCCGGCCCCACTCCGACCCCGGAGCCCACGACCAGCGGCCTCGCG 
GGCGGCGCGGCGAGCCACGGCCCTTCCCCGTTCCCTCGGCCCTGGGCGCCCCACGCGCTC 
CCGTTCTGGGACACGCCGCTGAACCACGGGCTGAACGTGTTCGTGGGCGCCGCCCTGTGC 
ATCACCATGCTGGGCCTGGGCTGCACGGTGGACGTGAACCACTTCGGGGCGCACGTCCGT 
CGGCCCGTGGGCGCGCTGCTGGCAGCGCTCTGCCAGTTCGGCCTCCTGCCGCTGCTGGCC 
TTCCTGCTGGCCCTCGCCTTCAAGCTGGACGAGGTGGCCGCCGTGGCGGTGCTCCTGTGT 
GGCTGCTGTCCCGGCGGCAATCTCTCCAATCTTATGTCCCTGCTGGTTGACGGCGACATG 
AACCTCAGCATCATCATGACCATCTCCTCCACGCTTCTGGCCCTCGTCTTGATGCCCCTG 
TGCCTGTGGATCTACAGCTGGGCTTGGATCAACACCCCTATCGTGCAGTTACTACCCCTA 
GGGACCGTGACCCTGACTCTCTGCAGCACTCTCATACCTATCGGGTTGGGCGTCTTCATT 
CGCTACAAATACAGCCGGGTGGCTGACTACATTGTGAAGGTAAGGCCCGTTTCCCTGTGG 
TCTCTGCTAGTGACTCTGGTGGTCCTTTTCATAATGACCGGCACTATGTTAGGACCTGAA 
CTGCTGGCAAGTATCCCTGCAGCTGTTTATGTGATAGCAATTTTTATGCCTTTGGCAGGC 
TACGCTTCAGGTTATGGTTTAGCTACTCTCTTCCATCTTCCACCCAACTGCAAGAGGACT 
GTATGTCTGGAAACAGGTAGTCAGAATGTGCAGCTCTGTACAGCCATTCTAAAACTGGCC 
TTTCCACCGCAATTCATAGGAAGCATGTACATGTTTCCTTTGCTGTATGCACTTTTCCAG 
TCTGCAGAAGCGGGGATTTTTGTTTTAATCTATAAAATGTATGGAAGTGAAATGTTGCAC 
AAGCGAGATCCTCTAGATGAAGATGAAGATACAGATATTTCTTATAAAAAACTAAAAGAA 
G AGG AAATGG CAG ACACT TC CT ATGG C ACAGTG AAAG CAG AAAAT ATAAT AATG ATGG AA 
ACCGCTCAGACTTCTCTCTA AATGTGGAGATACACAGGAGCTTCTATCTTGCTGAAATAT 
TGCTTCA TATTTATAGCCTGTGGTAGTGCACATGGTTAA CATAAAAGATAACACTGGTTC 
ACATCATACATGTAA CAATTCTGATCTTT TTAAGGTTCACTGGTG TATTAACCAAACGTT 
GTCACAAATTACAAATCAATGCTGTAATATA ATTTGCACCTGGAATGGCTAACGTGAAGC 
CTGAATTA AATGTGGTTTTTAGT TTTTACCATCACCAATTTCTATGACTGTTGCAAATAC 
AGAATCTATTAGAAAA CAGGGTC TTGGAAATGTAGAATTTTGGCGCACTATGAGGAAAAA 
CAAGCTATCTTTGTAAAGCATAATTGAGTTTAATGTAAT TGTTGTAAAAAAAAAAGTGTG 
CTTGCTCTACTTAAAATTCCTCACAATGTTGAATTTTGACCTGTATTCAGAAGAATTCCA 
AAACAGG TCAG TTAAATAAGGAAATATAGTATT TG TCAAAC CAG TAT CAGAG AAAAG TTA 
CATTAATGTATTTGATTACTTGATCTGGTATCTACTT ATTAATGAATAATCAACATTTTT 
CTAGTGAA ~ """" " ~ ~ 



In a search of public sequence databases, the NOV48 nucleic acid sequence, located on 
chromsome 4 has 250 of 395 bases (63%) identical to a gbrGENBANK- 
ID:HUMNTCP|acc:L21893.1 mRNA from Homo sapiens (Human Na/taurocholate 
cotransporting polypeptide mRNA, complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV48 polypeptide (SEQ ID NO : 1 1 4) encoded by SEQ ID NO : 1 1 3 
has 440 amino acid residues and is presented in Table 48B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV48 has no signal peptide 
and is likely to be localized in the plasma membrane with a certainty of 0.6000. 



273 



•a « n «3> ss?' «S' *ss am- .ssji J2& st?- ^ 

JL M Ui-P ?L3» .tti^P^^^^ff ^--3' QJ iU 8. w *U*# 



Table 48B. Encoded NOV48 protein sequence (SEQ ID NO:114). 



MDGND3WTLLFAPLLRDNYTLAPNASSLGPGTDLALAPASSAGPGPGLSLGPGPSFGFSP 
GPTPTPEPTTSGLAGGAASHGPSPFPRPWAPHALPFWDTPLNHGLNVFVGAALCITMLGL 
GCTVDVNHFGAHVRRPVGALLAALCQFGLLPLLAFLLALAFKLDEVAAVAVLLCGCCPGG 
NLSNLMS LLVDGDMNLS 1 1 MT I S S TLLALVLMP LCLW I YS WAW I NT P I VQLL P LGT VTLT 
LCSTLIPIGLGVFIRYKYSRVADYIVKVRPVSLWSLLVTLWLFIMTGTMLGPELLASIP 
AAVYVIAIFMPLAGYASGYGLATLFHLPPNCKRTVCLETGSQNVQLCTAILKLAFPPQFI 
GSMYMFPLLYALFQSAEAGIFVLIYKMYGSEMLHKRDPLDEDEDTDISYKKLKEEEMADT 
SYGTVKAENI IMMETAQTSL 



A search of sequence databases reveals that the NOV48 amino acid sequence has 126 
of 325 amino acid residues (38%) identical to, and 193 of 325 amino acid residues (59%) 
similar to, the 362 amino acid residue ptnr:SWISSPROT-ACC:O08705 protein from Mus 
musculus (Mouse) (SODIUM/BILE ACID COTRANSPORTER (NA(+)/BILE ACID 
COTRANSPORTER) (NA(+)/TAUROCHOLATE TRANSPORT PROTEIN) 
(SODIUM/TAUROCHOLATE COTRANSPORTING POLYPEPTIDE)). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV48 is expressed in at least Liver. This information was derived by determining the 
tissue sources of the sequences that were included in the invention including but not limited to 
SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV 48 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 48C. 



Table 48C. BLAST results for NOV48 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
<aa) 


Identity 
{%) 


Positives 
(%) 


Expect 


gi | 15082287 | gb | AAH1 


protein for 
IMAGE:3502817) 
[Homo sapiens] 


467 


437/440 
(99%) 


437/440 
(99%) 


0.0 


2048.l|AAH12048 


(BC012048 


gi|l7512162|qb|AAHl 


protein for 
MGC: 29802) [Homo 
sapiens] 


432 


437/440 
(99%) 


437/440 
(99%) 


0.0 


9066.l|AAH19066 


(BC019066) 


gi | 15294592 | ref |XP 


protein XP__053248 
[Homo sapiens] 


435 


358/445 
(80%) 


365/445 
(81%) 


e-66 


053248. l| 
(XM 053248) 


gi | 3 980315 | emb | CAA1 


hepatic sodium- 
dependent bile 
acid transporter 
[Oryctolagus 
cuniculus] 


348 


125/334 
(37%) 


192/334 
(57%) 


8e-54 


0360. l| (AJ131361) 
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gi|2811069isp|O0870 


SODIUM/BILE ACID 


362 


123/312 


189/312 


e-52 


5|NTCP MOUSE 


COTRANS PORTER 
(NA{+)/BILE ACID 

COTRANS PORTER) 
(NA ( + ) / TAUROCHOLA 

TE TRANSPORT 

PROTEIN) 

(SODIUM/TAUROCHOL 
ATE 

COTRANS PORT ING 
POLYPEPTIDE) 




(39%) 


(60%) 





Table 48D lists the domain descriptions from DOMAIN analysis results against 
NOV48. This indicates that the NOV48 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 48D. Domain Analysis of NOV48 

gnl ] Pf am| pf am0l758 , SBF, Sodium Bile acid symporter family. This 
family consists of Na+/bile acid co- transporters . These transmembrane 
proteins function in the liver in the uptake of bile acids from portal 
blood plasma a process mediated by the co-transport of Na+. Also in 
the family is ARC3 from S. cerevisiae this is a putative transmembrane 
protein involved in resistance to arsenic compounds. 

CD-Length = 186 residues, 95.7% aligned 

Score = 104 bits (259) , Expect = le-23 



AcDNA probe from a cloned rat liver Na+/taurocholate cotransporting polypeptide 
(Ntcp) was used to screen a human liver cDNA library. A 1,599-bp cDNA clone that encodes 
a human Na+/taurocholate cotransporting polypeptide (NTCP) was isolated. The human 

10 NTCP consists of 349 amino acids (calculated molecular mass of 38 kD) and exhibits 77% 
amino acid homology with the rat Ntcp. In vitro translation experiments indicate that the 
protein is glycosylated and has a molecular weight similar to the rat Ntcp. Injection of in vitro 
transcribed cRNA into Xenopus laevis oocytes resulted in the expression of Na(+)-dependent 
taurocholate uptake. Saturation kinetics indicated that the human NTCP has a higher affinity 

1 5 for taurocholate (apparent Km = 6 microM) than the previously cloned rat protein (apparent 
Km = 25 microM). NTCP-mediated taurocholate uptake into oocytes was inhibited by all 
major bile acid derivatives (100 microM), bumetanide (500 microM), and 
bromosulphophthalein (100 microM). Southern blot analysis of genomic DNA from a panel of 
human/hamster somatic cell hybrids mapped the human NTCP gene to chromosome 14. 

20 PMID: 8132774 

Liver parenchymal cells continuously extract high amounts of bile acids from portal 
blood plasma. This uptake process is mediated by a Na+/bile acid cotransport system. A 
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cDNA encoding the rat liver bile acid uptake system has been isolated by expression cloning 
in Xenopus laevis oocytes. The cloned transporter is strictly sodium-dependent and can be 
inhibited by various non-bile-acid organic compounds. Sequence analysis of the cDNA 
revealed an open reading frame of 1086 nucleotides coding for a protein of 362 amino acids 
5 (calculated molecular mass 39 kDa) with five possible N-linked glycosylation sites and seven 
putative transmembrane domains. Translation experiments in vitro and in oocytes indicate that 
the transporter is indeed glycosylated and that its polypeptide backbone has an apparent 
molecular mass of 33-35 kDa. Northern blot analysis with the cloned probe revealed 
crossreactivity with mRNA species from rat kidney and intestine as well as from liver tissues 
10 of mouse, guinea pig, rabbit, and man. PMID: 1 96 1 729 

Using expression cloning in Xenopus laevis oocytes, a cDNA encoding a rat liver 
organic anion-transporting polypeptide (oatp) was isolated. The cloned oatp mediated Na(+)- 
independent uptake of sulfobromophthalein (BSP) which was Cl(-)-dependent in the presence 
of bovine serum albumin (BSA) at low BSP concentrations (e.g., 2 microM). Addition of 

1 5 increasing amounts of BSA had no effects on the maximal velocity of initial BSP uptake, but it 
increased the Km value from 1.5 microM (no BSA) to 24 microM (BSA/BSP molar ratio, 3.7) 
and 35 microM (BSA/BSP ratio, 1 8.4). In addition to BSP, the cloned oatp also mediated 
Na(+)-independent uptake of conjugated (taurocholate) and unconjugated (cholate) bile acids. 
Sequence analysis of the cDNA revealed an open reading frame of 2010 nucleotides coding 

20 for a protein of 670 amino acids (calculated molecular mass, 74 kDa) with four possible N- 

linked glycosylation sites and 1 0 putative transmembrane domains. Translation experiments in 
vitro indicated that the transporter was indeed glycosylated and that its polypeptide backbone 
had an apparent molecular mass of 59 kDa. Northern blot analysis with the cloned probe 
revealed crossreactivity with several mRNA species from rat liver, kidney, brain, lung, 

25 skeletal muscle, and proximal colon as well as from liver tissues of mouse and rabbit, but not 
of skate (Raja erinacea) and human. PMID: 8278353 

Active uptake of bile acids from the lumen of the small intestine is mediated by an ileal 
Na(+)-dependent bile acid transport system. To identify components of this transport system, 
an expression cloning strategy was employed to isolate a hamster ileal cDNA that exhibits bile 
30 acid transport activity. By Northern blot analysis, mRNA for the cloned transporter was 

readily detected in ileum and kidney but was absent from liver and proximal small intestine. 
The transporter cDNA encoded a 348-amino acid protein with seven potential transmembrane 
domains and three possible N-linked glycosylation sites. The amino acid sequence was 35% 
identical and 63% similar to the rat liver Na+/bile acid cotransporter. After transfection into 
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COS cells, the hamster cDNA transported taurocholate in a strict Na(+)-dependent fashion 
with an apparent Km of 33 microM. This taurocholate transport was inhibited by various bile 
acids but not by taurine or other organic anions. The Na+ dependence, saturability, and bile 
acid specificity of transport as well as the tissue specificity of mRNA expression strongly 
5 argue that the transporter cDNA characterized in this study is the Na+/bile acid cotransporter 
described previously in ileum. PMID: 8288599 

Uptake of long-chain and aromatic neutral amino acids into cells is known to be 
catalyzed by the Na(+)-independent system L transporter, which is ubiquitous in animal cells 
and tissues.. The 2.3-kilobase cDNA codes for a protein of 683 amino acids. The transporter 
1 0 has four putative membrane-spanning domains and bears no sequence or structural homology 
to any known animal or bacterial transporter. When transcribed and expressed in Xenopus 
oocytes, the transporter exhibits many, but not all, of the characteristics of L-system 
transporters, suggesting that this represents one of several related L-system transporters. 
PMID: 1729674 

1 5 The phylogenic and ontogenic expression of mRNA for the Na+/bile acid cotransporter 

was determined by Northern analysis utilizing a full-length cDNA probe recently cloned from 
rat liver. mRNA was detected in several mammalian species, including rat, mouse, and man, 
but could not be found in livers from nonmammalian species, including chicken, turtle, frog, 
and small skate. When expression of the bile acid transporter in developing rat liver was 

20 studied, mRNA was detected between 1 8 and 21 days of gestation, at the time when Na(+)- 

dependent bile acid transport is first detected. Two hepatoma cell lines (HTC and HepG2), the 
latter of which is known to have lost the Na+/bile acid cotransport system, also did not express 
mRNA for this transporter. Finally, when mRNA from the lower vertebrate (the small skate) 
was injected into Xenopus oocytes, only a sodium-independent, chloride-dependent transport 

25 system for bile acids was expressed, confirming the integrity of the mRNA and consistent with 
prior functional studies of bile acid transport in this species. These findings establish that the 
Na+/bile acid cotransport mRNA is first transcribed in mammalian species, a process that is 
recapitulated late during mammalian fetal development in rat liver, and that this mRNA is lost 
in dedifferentiated hepatocytes. In contrast, the mRNA for a multispecific Na+/independent 

30 organic anion transport system is transcribed earlier in vertebrate evolution. PMID: 8421672 

The disclosed NOV48 nucleic acid of the invention encoding a sodium/bile acid 

transporter-like protein includes the nucleic acid whose sequence is provided in Table 48A or 

a fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 

bases may be changed from the corresponding base shown in Table 48A while still encoding a 
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protein that maintains its sodium/bile acid transporter-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
5 includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 

1 0 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 37 percent of the bases may be so changed. 

The disclosed NOV48 protein of the invention includes the sodium/bile acid 
transporter-like protein whose sequence is provided in Table 48B. The invention also includes 
a mutant or variant protein any of whose residues may be changed from the corresponding 

15 residue shown in Table 48B while still encoding a protein that maintains its sodium/bile acid 
transporter-like activities and physiological functions, or a functional fragment thereof. In the 
mutant or variant protein, up to about 62 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

20 The above defined information for this invention suggests that this sodium/bile acid 

transporter-like protein (NOV48) may function as a member of a "sodium/bile acid transporter 
family". Therefore, the NOV48 nucleic acids and proteins identified here may be useful in 
potential therapeutic applications implicated in (but not limited to) various pathologies and 
disorders as indicated below. The potential therapeutic applications for this invention include, 

25 but are not limited to: protein therapeutic, small molecule drug target, antibody target 

(therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 
marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 
and in vitro of all tissues and cell types composing (but not limited to) those defined here. 
The NOV48 nucleic acids and proteins of the invention are useful in potential 

30 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the sodium/bile acid 
transporter-like protein (NOV48) may be useful in gene therapy, and the sodium/bile acid 
transporter-like protein (NOV48) may be useful when administered to a subject in need 
thereof. By way of nonlimiting example, the compositions of the present invention will have 
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efficacy for treatment of patients suffering from cancer ,trauma, regeneration (in vitro and in 
vivo), viral/bacterial/parasitic infections, Von Hippel-Lindau (VHL) syndrome, 
Cirrhosis,Transplantation, or other pathologies or conditions. The NOV48 nucleic acid 
encoding the sodium/bile acid transporter-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV48 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV48 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV48 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV49 

A disclosed NOV49 nucleic acid of 2313 nucleotides (also referred to as CG57721-01) 
encoding a prestin-like protein is shown in Table 49A. The start and stop codons are in bold 
letters. 



Table 49A. NOV49 nucleotide sequence (SEQ ID NO: 115). 

CCATTGACTGCAGGAAGGTTGGCCAGCAGAGCAAATGCC ATGCCTGCAGAGGACAACGAG 
ACTGAAGCTCAGCAGCATGACCGCTACGTGGTAGACAGAGCCGCATACTCCCTTACCCTC 
TTCGACGATGAG TTTGAG AAGAAGGAC CGGACATAC C CAGTGGGAGAGAAAC TT CG CAAT 
GCCTTCAGATGTTCCTCAGCCAAGATCAAAGCTGTGGTGTTTGGGCTGCTGCCTGTGCTC 
TCCTGGCTCCCCAAGTACAAGATTAAAGACTACATCATTCCTGACCTGCTCGGTGGACTC 
AGCGGGGGATCCATCCAGGTCCCACAAGGTATGGCATTTGCTCTGCTGGCCAACCTTCCT 
GCAGTCAATGGCCTCTACTCCTCCTTCTTCCCCCTCCTGACCTACTTCTTCCTGGGGGGT 
GTTCACCAGATGGTGCCAGGTACCTTTGCCGTTATCAGCATCCTGGTGGGTAACATCTGT 
CTGCAGCTGGCCCCAGAGTCGAAATTCCAGGTCTTCAACAATGCCACCAATGAGAGCTAT 
GTGGACACAGCAGCCATGGAGGCTGAGAGGCTGCACGTGTCAGCTACGCTAGCCTGCCTC 
ACCGCCATCATCCAGATGGGTCTGGGCTTCATGCAGTTTGGCTTTGTGGCCATCTACCTC 
TCCGAGTCCTTCATCCGGGGCTTCATGACGGCCGCCGGCCTGCAGATCCTGATTTCGGTG 
CTCAAGTACATCTTCGGACTGACCATCCCCTCCTACACAGGCCCAGGGTCCATCGTCTTT 
GTGAGTCTGGGGATGTGCAAAAACCTCCCCCACACCAACATCGCCTCGCTCATCTTCGCT 
CTCATCAGCGGTGCCTTCCTGGTGCTGGTGAAGGAGCTCAATGCTCGCTACATGCACAAG 
ATTCGCTTCCCCATCCCTACAGAGATGATTGTGGTAAGGACCTTGTTCAGAGCTGGGTGT 
AAGATG CC CAAAAAGTATCACATGCAGATCGTGGGAGAAATC CAACT CGGCAGGTTC CCC 
ACCCCGGTGTCGCCTGTGGTCTCACAGTGGAAGGACATGATAGGCACAGCCTTCTCCCTA 
GCCATCGTGAGCTACGTCATCAACCTGGCTATGGGCCGGACCCTGGCCAACAAGCACGGC 
TACGACGTGGATTCGAACCAGGAGATGATCGCTCTCGGCTGCAGCAACTTCTTTGGCTCC 
TTCTTTAAAATTCATGTCATTTGCTGTGCGCTTTCTGTCACTCTGGCTGTGGATGGAGCT 
GGAGGAAAATCCCAGGTGAGCCTTGTTCTAGGGGAGTTGTCTGAGCTCCCCTTCTTACTC 
ACCACGGGGTTTGCTTTAAGAGTACTCAGGTGTCTCTCTGTGCTAGGAGCCCTGATCGCT 
GTCAAT CT CAAGAAC T C C CT CAAG CAACTCACCGAC C C C TAC TAC C TG TGGAGGAAG AG C 
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AAGCTGGACCAGTGCATCTGGGTAGTGAGCTTCCTCTCCTCCTTCTTCCTCAGCCTGCCC 
TATGGTGTGGCAGTGGGTGTCGCCTTCTCCGTCCTGGTCGTGGTCTTCCAGACTCAGAGT 
CGAAATGGCTATGCACTGGCCCAGGTCATGGACACTGACATTTATGTGAATCCCAAGACC 
TATAATAGGGTACAGGATATCCAGGGGATTAAAATCATCACGTACTGCTCCCCTCTCTAC 
TTTGCCAACTCAGAGATCTTCAGGCAAAAGGTCATCGCCAAGGTAAGGCTCAGTCCCTGG 
CGACCAGAGGCTCTGGACAGAGAGTGGCCGGAAAATGGAAGCAGAAGGGCGGTGGGACCC 
AACAACAACCAGACCCCGGCTAACGGCACCAGCGTGTCCTATATCACCTTCAGCCCTGAC 
AGCTCCTCACCTGCCCAGAGTGAGCCACCAGCCTCCGCTGAGGCCCCCGGCGAGCCCAGT 
GACATGCTGGCCAGCGTCCCACCCTTCGTCACCTTCCACACCCTCATCCTCGACATGAGT 
GGAGTCAGCTTCGTGGACTTGATGGGCATCAAGGCCCTGGCCAAGCTGAGCTCCACCTAT 
GGGAAGATCGGCGTAAAGGTCTTCTTGGTGAACATCCATGCCCAGGTGTACAATGACATT 
AGC C ATGGAGG CGT C T TTGAGGATGGGAGC CTAGGATG CAAGCACGT CTTT C C C AG CAT A 
CATGACGCAGTCCTCTTTGCCCAGCTGATTCAGTTACCTGGATTGAGGTCACTGGCAATG 
GCTGAAGTGGAGACGCAGGTGGAACTGGTTCAGGCCGGGGGAATCACCCACTTGAGTTTG 
TACTAAAAGCCCCAGCCCAGCCCTGTTTCTCTT 



In a search of public sequence databases, the NOV49 nucleic acid sequence, located on 
chromsome 3 has 966 of 1618 bases (59%) identical to a gbrGENBANK- 
ID:AF279265|acc:AF279265.1 mRNA from Homo sapiens (Homo sapiens putative anion 
transporter 1 mRNA, complete cds). Public nucleotide databases include all GenBank 
databases and the GeneSeq patent database. 

The disclosed NOV49 polypeptide (SEQ ID NO:l 16) encoded by SEQ ID NO:l 15 
has 748 amino acid residues and is presented in Table 49B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV49 has a signal peptide and 
is likely to be localized at the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV49 peptide is between amino acids 8 and 9. 



Table 49B. Encoded NOV49 protein sequence (SEQ ID NO:116). 



MPAEDNETEAQQHDRYWDRAAYSLTLFDDEFEKKDRTYPVGEKLRNAFRCSSAKIKAW 
FGLLPVLSWLPKYKIKDYIIPDLLGGLSGGSIQVPQGMAFALLANLPAVNGLYSSFFPLL 
TYFFLGGVHQMVPGTFAVI S I LVGNICLQLAPESKFQVFNNATNES YVDTAAMEAERLHV 
SATLACLTAIIQMGLGFMQFGFVAIYLSESFIRGFMTAAGLQILISVLKYIFGLTIPSYT 
GPGSIVFVSLGMCKNLPHTNIASLIFALISGAFLVLVKELNARYMHKIRFPIPTEMIWR 
TLFRAGCKMPKKYHMQ I VGE I QLGRFPTP VS PVVSQWKDMIGTAFS LAI VS YVINLAMGR 
TLANKHGYDVDSNQEMIALGCSNFFGSFFKIHVICCALSVTLAVDGAGGKSQVSLVLGEL 
SELPFLLTTGFALRVLRCLSVLGALIAWLKNSLKQLTDPYYLWRKSKLDQCIWWSFLS 
SFFLSLPYGVAVGVAFSVLVWFQTQSRNGYALAQVMDTDIYVNPKTYNRVQDIQGIKII 
TYCS PLYFANSE I FRQKVI AKVRLS PWRPEALDREWPENGSRRAVGPNNNQTPANGTS VS 
YITFS PDS S S PAQSEPPAS AEAPGE PSDMLASVPPFVTFHTL I LDMSGVS FVDLMGI KAL 
AKLSSTYGKIGVKVFLVNIHAQVYNDISHGGVFEDGSLGCKHVFPSIHDAVLFAQLIQLP 
GLR S LAMAE VETQ VELVQAGG I THL S L Y 



A search of sequence databases reveals that the NOV49 amino acid sequence has 277 
of 732 amino acid residues (37%) identical to, and 434 of 732 amino acid residues (59%) 
similar to, the 744 amino acid residue ptnr:TREMBLNEW-ACC:CAC21555 protein from 
Rattus norvegicus (Rat) (PRESTIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 



280 



ar s- ru- ust» -a to sru ~~jt :sa .ass iru- ^ 

..^H.i ?ii42' "'l^-J' ^Wii 1 ««i^^H^|J* 5 '1„JJ «j 3L..S) 3>*<~ 

NOV49 is expressed in at least Parotid Salivary glands. Expression information was 
derived from the tissue sources of the sequences that were included in the derivation of the 
sequence of CG5772 1-01. The sequence is predicted to be expressed in the following tissues 
because of the expression pattern of (GENBANK-ID: gb:GENBANK- 
ID:AF279265|acc:AF279265.1) a closely related Homo sapiens putative anion transporter 1 
mRNA, complete cds homolog in species Homo sapiens rkidney and pancreas. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV49 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 49C. 



Table 49C. BLAST results for NOV49 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
{%) 


Expect 


gi 1 1658 8681 |gb| AAL2 
6867.l|AF314958 1 
(AF314958) 


anion 

transporter /excha 
nger-9 [Homo 
sapiens] 


887 


642/739 
(86%) 


657/739 
(88%) 


0.0 


gi|l6418413|ref | NP 
443166. l| 
(NM_052934) 


solute carrier 
family 26, member 
9 [Homo sapiens] 


791 


642/739 
(86%) 


657/739 
(88%) 


0.0 


gi(l50H89l|ref |NP 
109652 .21 
(NM 030727) 


prestin (motor 
protein) [Mus 
musculus] 


744 


281/748 
(37%) 


430/748 
(56%) 


e-129 


gi|l3540646|ref |NP 
110467 .1) 
(NM_030840) 


prestin [Rattus 
norvegicus] 


744 


278/746 
(37%) 


435/746 
(58%) 


e-128 


gi|8050590|gb|AAF71 
715.1|AF230376 1 
(AF230376) 


prestin [Meriones 
unguiculatus] 


744 


271/738 
(36%) 


424/738 
(56%) 


e-125 



Tables 49D-E list the domain descriptions from DOMAIN analysis results against 
NOV49. This indicates that the NOV49 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 49D. Domain Analysis of NOV49 

gnl|Pfamlpfam0091 6 , Sulfate_transp, Sulfate transporter family. 
Mutations in human Diastrophic Dysplasia Protein lead to several diseases. 
CD-Length = 312 residues, 100.0% aligned 

Score = 167 bits (423) , Expect = 2e-42 
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Table 49E. Domain Analysis of NOV49 



gnl 1 Pf am|pfam01740, STAS, STAS domain. The STAS (after Sulphate 
Transporter and AntiSigma factor antagonist) domain is found in the C 
terminal region of Sulphate transporters and bacterial antisigma 
factor antagonists. It has been suggested that this domain may have a 
general NTP binding function. 

CD-Length = 106 residues, 623% aligned 

Score = 57.0 bits (136), Expect = 4e-09 



A second distinct family of anion transporters, in addition to the classical SLC4 (or 
AE) family, has recently been delineated. Members of the SLC26 family are structurally well 
conserved and can mediate the electroneutral exchange of Cl(-) for HCO(-)(3) across the 
5 plasma membrane of mammalian cells like members of the SLC4 family. Three human 
transporter proteins have been functionally characterized: SLC26A2 (DTDST), SLC26A3 
(CLD or DRA), and SLC26A4 (PDS) can transport with different specificities the chloride, 
iodine, bicarbonate, oxalate, and hydroxyl anions, whereas SLC26A5 (prestin) was suggested 
to act as the motor protein of the cochlear outer hair cell. 

10 Electromotility, i.e., the ability of cochlear outer hair cells (OHCs) to contract and 

elongate at acoustic frequencies, is presumed to depend on the voltage-driven conformational 
changes of "motor" proteins present in the OHC lateral plasma membrane. Recently, two 
membrane proteins have been proposed as candidates for the OHC motor. A sugar transporter, 
GLUT-5, was proposed based on its localization in the OHCs and on the observation that 

15 sugar transport alters the voltage sensitivity of the OHC motor mechanism. Another candidate, 
"prestin," was identified from a subtracted OHC cDNA library and shown to impart voltage- 
driven shape changes to transfected cultured cells. 

The disclosed NOV49 nucleic acid of the invention encoding a prestin-like protein 
includes the nucleic acid whose sequence is provided in Table 49A or a fragment thereof. The 

20 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 49A while still encoding a protein that maintains 
its prestin-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 

25 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
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in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 41 percent of the 
bases may be so changed. 



sequence is provided in Table 49B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 49B 
while still encoding a protein that maintains its prestin-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 63 

1 0 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this prestin-like protein 
(NOV49) may function as a member of a "prestin family". Therefore, the NOV49 nucleic 

15 acids and proteins identified here may be useful in potential therapeutic applications 

implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

20 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV49 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the prestin-like protein 

25 (NOV49) may be useful in gene therapy, and the prestin-like protein (NOV49) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from Autoimmune disease, Renal artery stenosis, Interstitial nephritis, Glomerulonephritis, 
Polycystic kidney disease, Systemic lupus erythematosus, Renal tubular acidosis, IgA 

30 nephropathy, Hypercalceimia, Lesch-Nyhan syndrome,Diabetes,Von Hippel-Lindau (VHL) 
syndrome , Pancreatitis, Obesity, Xerostomia, cancer,trauma, regeneration (in vitro and in 
vivo), viral/bacterial/parasitic infections, or other pathologies or conditions. The NOV49 
nucleic acid encoding the prestin-like protein of the invention, or fragments thereof, may 



5 



The disclosed NOV49 protein of the invention includes the prestin-like protein whose 
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further be useful in diagnostic applications, wherein the presence or amount of the nucleic acid 
or the protein are to be assessed. 

NOV49 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV49 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV49 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV50 

A disclosed NOV50 nucleic acid of 1335 nucleotides (also referred to as CG57787-01) 
encoding a sulfate transporter-like protein is shown in Table 50A. The start and stop codons 
are in bold letters. 



Table 50A. NOV50 nucleotide sequence (SEQ ID NO:117). 

GCCTTCCTGTCCGGCTGCATCCAGCTGGCCATGGGGGTCCTGCGTTTGGGGTTCCTGCTGGACTTCATTT 
CCTACCCCGTCATTAAAGGCTTCACCTCTGCTGCTGCCGTCACCATCGGCTTTGGACAGATCAAGAACCT 
GCTGGGACTACAGAACATCCCCAGGCCGTTCTTCCTGCAGGTGTACCACACCTTCCTCAGGATTGCAGAG 
ACCAGGGTAGGTGACGCCGTCCTGGGGCTGGTCTGCATGCTGCTGCTGCTGGTGCTGAAGCTGATGCGGG 
ACCACGTGCCTCCCGTCCACCCCGAGATGCCCCCTGGTGTGCGGCTCAGCCGTGGGCTGGTCTGGGCTGC 
CACGACAGCTCGCAACGCCCTGGTGGTCTCCTTCGCAGCCCTGGTTGCGTACTCCTTCGAGGTGACTGGA 
TACCAGCCTTTCATCCTAACAGGGGAGACAGCTGAGGGGCTCCCTCCAGTCCGGATCCCGCCCTTCTCAG 
TGACCACAGCCAACGGGACGATCTCCTTCACCGAGATGGTGCAGGACATGGGAGCCGGGCTGGCCGTGGT 
G C CCCTGATGGG CC T C C TGGAGAGCATTG CGGTGG C C AAAG CCT T CG CAT C T CAG AAT AATTAC CG CAT C 
GATGCCAACCAGGAGCTGCTGGCCATCGGTCTCACCAACATGTTGGGCTCCCTCGTCTCCTCCTACCCGG 
TCACAGGCAGCTTTGGACGGACAGCCGTGAACGCTCAGTCGGGGGTGTGCACCCCGGCGGGGGGCCTGGT 
GACGGGAGTGCTGGTGCTGCTGTCTCTGGACTACCTGACCTCACTGTTCTACTACATCCCCAAGTCTGCC 
CTGGCTGCCGTCATCATCATGGCCGTGGCCCCGCTGTTCGACACCAAGATCTTCAGGACGCTCTGGCGTG 
TTAAGAGGCTGGACCTGCTGCCCCTGTGCGTGACCTTCCTGCTGTGCTTCTGGGAGGTGCAGTACGGCAT 
CCTGGCCGGGGCCCTGGTGTCTCTGCTCATGCTCCTGCACTCTGCAGCCAGGCCTGAGACCAAGGTGTCA 
GAGGGGCCGGTTCTGGTCCTGCAGCCGGCCAGCGGCCTGTCCTTCCCTGTCCTCTGCCCCCCACTCCCTG 
CTGTTCAGGACCCCAAGACCCTGTCCCCGACGCTCTCCAGTCCACAAGGATGCAGGCATCTCTGAGTGGG 
CTGGACCGTCCTCTGTGGGCCTCAGCCAGTGGTGCTGCAGCAAGGGTGGTGGCTCCCCACATATCACTCC 
TTCCCTGCCCCTAAAGTCCGGTTCCTGTTTCTGGGGGGTTGATTTTAGGGGAGCTAAGGGCCTGTGAGTC 
CTAGT 



In a search of public sequence databases, the NOV50 nucleic acid sequence has 585 of 
993 bases (58%) identical to a gb:GENBANK-ID:AF297659|acc:AF297659.2 mRNA from 
Homo sapiens (Homo sapiens sulfate/anion transporter SAT-1 protein (SLC26A1) mRNA, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 
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The disclosed NOV50 polypeptide (SEQ ID NO: 1 1 8) encoded by SEQ ID NO: 1 1 7 
has 384 amino acid residues and is presented in Table 50B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV50 has a signal peptide and 
is likely to be localized extracellularly with a certainty of 0.6000. The most likely cleavage site 
for a NOV50 peptide is between amino acids 20 and 21 . 



Table SOB. Encoded NOV50 protein sequence (SEQ ID NO: 11 8). 



MGVLRLGFLLDFISYPVIKGFTSAAAVTIGFGQIKNLLGLQNIPRPFFLQVYHTFLRIAE 
TRVGDAVLGLVCMLLLLVLKLMRDHVPPVHPEMPPGVRLSRGLVWAATTARNALWSFAA 
LVAYSFEVTGYQPFILTGETAEGLPPVRIPPFSVTTANGTISFTEMVQDMGAGLAWPLM 
GLLES I AVAKAFAS QNNYR IDANQELLAI GLTNMLGS LVS S YP VTG S FGRTAVNAQS GVC 
TPAGGLVTGVLVLLSLDYLTSLFYYIPKSAIJ^VIIMAVAPLFDTKIFRTLWRVKRLDLL 
PLCVTFLLCFWEVQYGILAGALVSLLMLLHSAARPETKVSEGPVLVLQPASGLSFPVLCP 
PLPAVQDPKTLSPTLSSPQGCRHL 



A search of sequence databases reveals that the NOV50 amino acid sequence has 146 
of 339 amino acid residues (43%) identical to, and 210 of 339 amino acid residues (61%) 
similar to, the 595 amino acid residue ptnr:SPTREMBL-ACC:Q9V879 protein from 

10 Drosophila melanogaster (Fruit fly) (CG5002 PROTEIN). Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 

NOV50 is expressed in at least Brain, Cerebral Medulla/Cerebral white matter, 
Coronary Artery, Epidermis, Frontal Lobe, Hippocampus, Hypothalamus, Kidney, Liver, 
Lung, Mammary gland/Breast, Oviduct/Uterine Tube/Fallopian tube, Pituitary Gland, Retina, 

1 5 Spinal Chord, Spleen, Substantia Nigra, Temporal Lobe, Testis, Umbilical Vein, Whole 

Organism. Expression information was derived from the tissue sources of the sequences that 
were included in the derivation of the sequence of CG57787_01.The sequence is predicted to 
be expressed in the following tissues because of the expression pattern of (GENBANK-ID: 
gb:GENBANK-ID:AF297659|acc:AF297659.2) a closely related Homo sapiens sulfate/anion 

20 transporter SAT-1 protein (SLC26A1) mRNA, complete cds homolog in species Homo 
sapiens .liver. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV50 polypeptide has homology to the amino acid sequences shown 

25 in the BLASTP data listed in Table 50C. 
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Table 50C. BLAST results for NOV50 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l8485058|ref | XP 
080006. 1| 
(XM_080006) 


CG5002 
[Drosophila 
melanogaster] 


595 


121/341 
(35%) 


169/341 
(49%) 


le-46 


qi | 7301216 | gb | AAF56 


Esp gene product 
[Drosophila 
melanogaster] 


654 


124/368 
(33%) 


180/368 
(48%) 


7e-44 


347. lj (AE003749) 


gi|l7738183 | ref |NP 


Epidermal stripes 
and patches 
[Drosophila 
melanogaster] 


623 


127/370 
(34%) 


179/370 
(48%) 


3e-43 


524490. l| 
<NM__079766 


gi | 73018 81 | gb | AAF56 


CG7 912 gene 
product 
[Drosophila 
melanogaster 


573 


120/323 
(37%) 


170/323 
(52%) 


4e-42 


989.1] {AE003772 


gi | 73 00 023 ] gb | AAF55 


CG6125 gene 
product 
[Drosophila 
melanogaster] 


640 


100/315 
(31%) 


162/315 
(50%) 


3e-38 


195. l| (AE003708) 



Table SOD lists the domain descriptions from DOMAIN analysis results against 
NOV50. This indicates that the NOV50 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 50D. Domain Analysis of NOV50 

gnl|Pfamlpfam009 1 6 , Sulfatetransp, Sulfate transporter family. 
Mutations in human Diastrophic Dysplasia Protein lead to several diseases. 
CD-Length = 312 residues, 95.2% aligned 

Score = 163 bits (413), Expect = le-41 

Lymphocytes continuously recirculate from blood, through lymphoid and other tissues, 
and back through the efferent lymphatics to the blood. The first critical step in lymphocyte 
migration from circulation into lymphoid tissues is the adhesion of lymphocytes to specialized 
postcapillary vascular sites called high endothelial venules (HEV). 

Two vertebrate sulfate transporters that play a role in sulfate incorporation in other 
tissues are members of the superfamily of anion exchangers: the diastrophic dysplasia sulfate 
transporter (DTDST; 222600), which is mutant in diastrophic dysplasia and certain other 
skeletal dysplasias, and downregulated in adenoma (DRA; 126650), which is mutant in 
familial chloride diarrhea (214700). These 2 sulfate transporters contain 12 membrane- 
spanning domains and are sensitive to the anion-exchanger inhibitor DIDS. Girard et al. 

(1999) showed that HEVECs express 2 functional classes of sulfate transporters defined by 
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their differential sensitivity to the DIDS anion-exchanger inhibitor. They reported the 
molecular characterization of a DIDS-resistant sulfate transporter from human HEVECs, 
designated SUT1. SUT1 belongs to the family of sodium-coupled anion transporters and 
exhibits 40 to 50% amino acid identity with the rat renal sodium/sulfate cotransporter NaSil, 
5 as well as with the human and rat sodium/dicarboxylate cotransporters NADC1/SDCT1 
(604148) and NADC3/SDCT2. 

The disclosed NOV50 nucleic acid of the invention encoding a sulfate transporter-like 
protein includes the nucleic acid whose sequence is provided in Table 50A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 

10 be changed from the corresponding base shown in Table 50A while still encoding a protein 
that maintains its sulfate transporter-like activities and physiological functions, or a fragment 
of such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

1 5 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 

20 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 42 percent of the bases may be so changed. 

The disclosed NOV50 protein of the invention includes the sulfate transporter-like 
protein whose sequence is provided in Table 50B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 

25 in Table 50B while still encoding a protein that maintains its sulfate transporter-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 57 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

30 The above defined information for this invention suggests that this sulfate transporter- 

like protein (NOV50) may function as a member of a "sulfate transporter family". Therefore, 
the NOV50 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
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protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 
5 The NOV50 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the sulfate transporter-like 
protein (NOV50) may be useful in gene therapy, and the sulfate transporter-like protein 
(NOV50) may be useful when administered to a subject in need thereof. By way of 

10 nonlimiting example, the compositions of the present invention will have efficacy for 

treatment of patients suffering from diastrophic dysplasia and certain other skeletal dysplasias, 
and adenoma, familial chloride diarrhea, CNS disorders, brain disorders including epilepsy, 
eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune 
disorders including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, 

15 inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; 

thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases such as 
asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic insufficiency 
and cancer; and prostate disorders including prostate cancer, or other pathologies or 
conditions. The NOV50 nucleic acid encoding the sulfate transporter -like protein of the 

20 invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV50 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV50 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

25 known in the art, using prediction from hydrophobic ity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV50 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

30 disorders. 
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NOV51 



A disclosed NOV51 nucleic acid of 2079 nucleotides (also referred to as CG57785-01) 
encoding a sulfate transporter-like protein is shown in Table 51 A. The start and stop codons 
are in bold letters. 



Table 51A. NOV51 nucleotide sequence (SEQ ID NO:119). 



ATGGTTGTGGCTGTCACAGATTTTAACTCCTCGGTGCACCAAGAGTACAGATTCCAAATTGCCTATATCT 
TAAAAACTTGTAAGAAAGAAGCAATGGTGAATGTGAATCTGAACACCAGGGAGAGTTCCAGAAAGGGAAT 
TCCCATCAGTTGGTACTACCTAATAATGGGCGTATTGGGTTTGGGCTTCATTGCCACTTACCTTCCGGAG 
TCTGCAATGAGTGCTTACCTGGCTGCTGTGGCACTTCATATCATGCTGTCCCAGCTGACTTTCATCTTTG 
GGATTATGATTAGTTTCCATGCCGGTCCCATCTCCTTCTTCTATGACATAATTAATTACTGTGTAGCTCT 
CCCAAAAGCGAATTCCACCAGCATTCTAGTATTTCTAACTGTTGTTGTTGCTCTGCGAATCAACAAATGT 
ATCAGAATTTCTTTCAATCAGTATCCCATTGAGTTTCCCATGGAATTATTTCTGATTATTGGCTTCACTG 
TGATTGCAAACAAGATAAGCATGGCCACAGAAACCAGCCAGACGCTTATTGACATGATTCCTTATAGCTT 
TCTGCTTCCTGTAACACCAGATTTCAGCCTTCTTCCCAAGATAATTTTACAAGCCTTCTCCTTATCTTTG 
GTGAGCTCCTTTCTGCTCATATTTCTGGGCAAGAAGATTGCCAGTCTTCACAATTACAGTGTCAATTCCA 
ACCAGGATTTAATAGCCATCGGCCTTTGCAATGTCGTCAGTTCATTTTTCAGATCTTGTGTGTTTACTGG 
TGCTATTGCTAGGACTATTATCCAGGATAAATCTGGAGGAAGACAACAGTTTGCATCTCTGGTAGGCGCA 
GGTGTGATGCTGCTCCTGATGGTGAAGATGGGACACTTTTTCTACACACTGCCAAATGTTGATATGGTAA 
AGGTG CCTCTT AAAG AAG AAG AAATTTT CAG CTTGTTT AATT CAAGTGACAC CAATCT ACAAGGAGGAAA 
GATTTGCAGGTGTTTCTGCAACTGTGATGATCTGGAGCCGCTGCCCAGGATTCTTTACACAGAGCGATTT 
GAAAATAAACTGGATCCCGAAGCATCCTCCATTAACCTGATTCACTGCTCACATTTTGAGAGCATGAACA 
CAAGCCAAACTGCATCCGAAGACCAAGTGCCATACACAGTATCGTCCGTGTCTCAGAAAAATCAAGGGCA 
ACAGTATGAGGAGGTGGAGGAAGTTTGGCTTCCTAATAACTCATCAAGAAACAGCTCACCAGGACTGCCT 
GATGTGGCGGAAAGCCAGGGGAGGAGATCACTCATCCCTTACTCAGATGCGTCTCTACTGCCCAGTGTCC 
ACACCATCATCCTGGATTTCTCCATGGTACACTACGTGGATTCACGGGGGTTAGTCGTATTAAGACAGAT 
ATGCAATGCCTTTCAAAACGCCAACATTTTGATACTCATTGCAGGGTGTCT^CTCTTCCATAGTCAGGGCA 
TTTGAGAGGAATGATTTCTTTGACGCTGGCATCACCAAGACCCAGCTGTTCCTCAGCGTTCACGACGCCG 
TGCTGTTTGCCTTGTCAAGGAAGGTCATAGGCTCCTCTGAGTTAAGCATCGATGAATCCGAGACAGTGAT 
ACGGGAAACCTACTCAGAAACAGACAAGAATGACAATTCAAGATATAAAATGAGCAGCAGTTTTCTAGGA 
AGCCAAAAAAATGTAAGTCCAGGCTTCATCAAGATCCAACAGCCTGTAGAAGAGGAGTCGGAGTTGGATT 
TGGAGCTGGAATCAGAACAAGAGGCTGGGCTGGGTCTGGACCTAGACCTGGATCGGGAGCTGGAGCCTGA 
AATGGAGCCCAAGGCTGAGACCGAGACCAAGACCCAGACCGAGATGGAGCCCCAGCCTGAGACTGAGCCT 
GAGATGGAGCCCAACCCCAAATCTAGGCCAAGAGCTCACACTTTTCCTCAGCAGCGTTACTGGCCTATGT 
ATCATCCGTCTATGGCTTCCACCCAGTCTCAGACTCAGACTCGGACATGGTCAGTGGAGAGGAGACGCCA 
TC CT ATGG ATT C AT ACTCACC AG AGGG C AAC AGCAATG AAG ATG T G TAG 



In a search of public sequence databases, the NOV51 nucleic acid sequence, located on 
chromsome 6 has 128 of 198 bases (64%) identical to a gb:GENBANK- 
ID:AF189262|acc:AF 189262.1 mRNA from Rattus norvegicus (Rattus norvegicus GABA-A 
receptor epsilon-like subunit (Epsilon) mRNA, complete cds). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV5 1 polypeptide (SEQ ID NO: 120) encoded by SEQ ID NO: 1 19 has 
692 amino acid residues and is presented in Table 51 B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV5 1 has a signal peptide and is 
likely to be localized in the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV5 1 peptide is between amino acids 69 and 70. 
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Table 51B. Encoded NOV51 protein sequence (SEQ ID NO: 120). 

M VVAVTDFNSS VHQE YRFQ I AY I LKTCKKEAMVNVNLNTRES S RKG I P I S W YYLI MGVLG 
LGF I ATYLPES AMSAYLAAVALHIMLSQLTFI FG I M I S FHAGP I S FFYD I INYC VALPKA 
NSTSILVFLTVWALRINKCIRISFNQYPIEFPMELFLIIGFTVIANKISMATETSQTLI 
DMIPYSFLLPVTPDFSLLPKIILiQAFSLSLVSSFLLIFLGKKIASLHNYSVNSNQDLIAI 
GLCNWSSFFRSCVFTGAIARTIIQDKSGGRQQFASLVGAGVMLLLMVKMGHFFYTLPNV 
DMVKVPLKEEEIFSLFNSSDTNLQGGKICRCFCNCDDLEPLPRILYTERFENKLDPEASS 
INLIHCSHFESMNTSQTASEDQVPYTVSSVSQKNQGQQYEEVEEVWLPNNSSRNSSPGLP 
DVAES QGRRSL I P YSDASLLPS VHT 1 1 LDFSMVHYVDSRGLWLRQ I CNAFQNAN I L I L I 
AGCHSSIVRAFERNDFFDAGITKTQLFLSVHDAVLFALSRKVIGSSELSIDESETVIRET 
YSETDKNDNSRYKMSSSFLGSQKNVSPGFIKIQQPVEEESELDLELESEQEAGLGLDLDL 
DRELEPEMEPKAETETKTQTEMEPQPETEPEMEPNPKSRPRAHTFPQQRYWPMYHPSMAS 
TQSQTQTRTWSVERRRHPMDSYSPEGNSNEDV 



A search of sequence databases reveals that the NOV51 amino acid sequence has 123 
of 123 amino acid residues (100%) identical to, and 123 of 123 amino acid residues (100%) 
similar to, the 123 amino acid residue ptnr:SPTREMBL-ACC:Q9NQP0 protein from Homo 
5 sapiens (Human) (BA48209.2 (NOVEL SULPHATE TRANSPORTER FAMILY 

MEMBER)). Public amino acid databases include the GenBank databases, SwissProt, PDB 
and PIR. 

NOV51 is expressed in at least Adipose, Peripheral Blood, Spinal Chord, Testis, and 
Colon. Expression information was derived from the tissue sources of the sequences that were 

1 0 included in the derivation of the sequence of CG57785 OLThe sequence is predicted to be 
expressed in the following tissues because of the expression pattern of (GENBANK-ID: 
gb:GENBANK-ID:AF189262|acc:AF189262.1) a closely related Rattus norvegicus GABA-A 
receptor epsilon-like subunit (Epsilon) mRNA, complete cds homolog in species Rattus 
norvegicus :Adipose, Peripheral Blood, Spinal Chord, Testis,Colon. This information was 

1 5 derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV5 1 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 51C. 

20 



Table 51C. BLAST results for NOV51 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi | 16418457 (ref |NP 


solute carrier 
family 26, member 
8 [Homo sapiens] 


970 


394/396 
(99%) 


394/396 
(99%) 


0.0 


443193 .1| 
(NM_052961 


gi | 16 588684 |gb| AAL2 


anion 

transporter/excha 
nger-8 [Homo 
sapiens] 


970 


405/464 
(87%) 


418/464 
(89%) 


0.0 


6868 .1 |AF314959 1 


(AF314959) 
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qi | 16552904 Jdbj | BAB 
/JLftUo.J_[ (AKUb/z/o/ 


unnamed protein 
product [Homo 
sapiens] 


278 


171/182 
( 93 %) 


172/182 
I 93u ) 


e-92 


qi | 94 53 72 6 | emb | CAB9 


bA4 8209.2 (Novel 

qii 1 T>Vi 7\ Y f> 

O Li, JL 1XG* L~ 

transporter 
family member) 
[Homo sapiens] 


123 


123/123 
(100% ) 


123/123 
(2.00%) 


3e-62 


9352 ll (AL133507> 


gi | 15080864 | gb | AAK5 


chloride- formate 
exchanger [Mus 
musculus] 


734 


78/251 
(31%) 


142/251 
(56%) 


e-35 


1131. l| (AY032863 



Tables 51D-E list the domain descriptions from DOMAIN analysis results against 
NOV5 1 . This indicates that the NOV5 1 sequence has properties similar to those of other 
proteins known to contain this domain. 

5 



Table 51D. Domain Analysis of NOV51 

gnl]Pfam|pfamOQ916 , Sulfate_transp, Sulfate transporter family. 
Mutations in human Diastrophic Dysplasia Protein lead to several diseases. 
CD-Length = 312 residues, 79.5% aligned 

Score = 145 bits (366) , Expect = 7e-36 



Table 51E. Domain Analysis of NOV51 

gnl 1 Pfa mlpfam01740 , STAS , STAS domain. The STAS (after Sulphate 
Transporter and AntiSigma factor antagonist) domain is found in the C 
terminal region of Sulphate transporters and bacterial antisigma 
factor antagonists. It has been suggested that this domain may have a 
general NTP binding function. 

CD-Length = 106 residues, 63.2% aligned 

Score = 53.1 bits (126), Expect = 5e-08 



The disclosed NOV51 nucleic acid of the invention encoding a sulfate transporter-like 
protein includes the nucleic acid whose sequence is provided in Table 51 A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 51 A while still encoding a protein 
that maintains its sulfate transporter-like activities and physiological functions, or a fragment 
of such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
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derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 36 percent of the bases may be so changed. 
5 The disclosed NOV5 1 protein of the invention includes the sulfate transporter-like 

protein whose sequence is provided in Table 5 IB. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 5 IB while still encoding a protein that maintains its sulfate transporter-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 

10 up to about 0 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2,that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this sulfate transporter- 
like protein (NOV51) may function as a member of a "sulfate transporter family". Therefore, 

1 5 the NO V5 1 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

20 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV51 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the sulfate transporter-like 

25 protein (NOV51) may be useful in gene therapy, and the sulfate transporter-like protein 
(NOV51) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from diastrophic dysplasia and certain other skeletal dysplasias, 
and adenoma, familial chloride diarrhea, CNS disorders, brain disorders including epilepsy, 

30 eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune 
disorders including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, 
inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; 
thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases such as 
asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic insufficiency 
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and cancer; and prostate disorders including prostate cancer, or other pathologies or 
conditions. The NOV5 1 nucleic acid encoding the sulfate transporter-like protein of the 
invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV5 1 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV51 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV51 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV52 

A disclosed NOV52 nucleic acid of 221 0 nucleotides (also referred to as CG57748- 
01) encoding a N-acetylgalactosaminyltransferase-like protein is shown in Table 52A. The 
start and stop codons are in bold letters. 



Table 52A. NOV52 nucleotide sequence (SEQ ID NO:121). 



TGCAGTTGGCGGGCGCA TGTGGGGGCGCACGGCGCGGCGGCGCTGCCCGCGGGAACTGCG 
GCGCGGCCGGGAGGCGCTGTTGGTGCTCCTGGCGCTACTGGCGTTGGCCGGGCTGGGCTC 
GGTGCTGCGGGCGCAGCGTGGGGCCGGGGCCGGGGCTGCCGAGCCGGGACCCCCGCGCAC 
CCCGCGCCCCGGGCGGCGCGAGCCGGTCATGCCGCGGCCGCCGGTGCCGGCGAACGCGCT 
GGGCGCGCGGGGCGAGGCGGTGCGGCTGCAGCTGCAGGGCGAGGAGCTGCGGCTGCAGGA 
GGAGAGCGTGCGGCTGCACCAGATTAACATCTACCTCAGCGACCGCATCTCACTGCACCG 
CCGCCTGCCCGAGCGCTGGAACCCGCTGTGCAAAGAGAAGAAATATGATTATGATAATTT 
GCCCAGGACATCTGTTATCATAGCATTTTATAATGAAGCCTGGTCAACTCTCCTTCGGAC 
AGTTTACAGTGTCCTTGAGACATCCCCGGATATCCTGCTAGAAGAAGTGATCCTTGTAGA 
TGACTACAGTGATAGAGAGCACCTGAAGGAGCGCTTGGCCAATGAGCTTTCGGGACTGCC 
CAAGGTGCGCCTGATCCGCGCCAACAAGAGAGAGGGCCTGGTGCGAGCCCGGCTGCTGGG 
GGCGTCTGCGGCGAGGGGCGATGTTCTGACCTTCCTGGACTGTCACTGTGAGTGCCACGA 
AGGGTGGCTGGAGCCGCTGCTGCAGAGGATCCATGAAGAGGAGTCGGCAGTGGTGTGCCC 
GGTGATTGATGTGATCGACTGGAACACCTTCGAATACCTGGGGAACTCCGGGGAGCCCCA 
GATCGGCGGTTTCGACTGGAGGCTGGTGTTCACGTGGCACACAGTTCCTGAGAGGGAGAG 
GATACGGATGCAATCCCCCGTCGATGTCATCAGGTCTCCAACAATGGCTGGTGGGCTGTT 
TGCTGTGAGTAAGAAATATTTTGAATATCTGGGGTCTTATGATACAGGAATGGAAGTTTG 
GGGAGGAGAAAACCTCGAATTTTCCTTTAGGATCTGGCAGTGTGGTGGGGTTCTGGAAAC 
ACACCCATGTTCCCATGTTGGCCATGTTTTCCCCAAGCAAGCTCCCTACTCCCGCAACAA 
GGCTCTGGCCAACAGTGTTCGTGCAGCTGAAGTATGGATGGATGAATTTAAAGAGCTCTA 
CTACCATCGCAACCCCCGTGCCCGCTTGGAACCTTTTGGGGATGTGACAGAGAGGAAGCA 
GCTCCGGGACAAGCTCCAGTGTAAAGACTTCAAGTGGTTCTTGGAGACTGTGTATCCAGA 
ACTGCATGTGCCTGAGGACAGGCCTGGCTTCTTCGGGATGCTCCAGAACAAAGGACTAAC 
AGACTACTGCTTTGACTATAACCCTCCCGATGAAAACCAGATTGTGGGACACCAGGTCAT 
TCTGTACCTCTGTCATGGGATGGGCCAGAATCAGTTTTTCGAGTACACGTCCCAGAAAGA 
AATACGCTATAACACCCACCAGCCTGAGGGCTGCATTGCTGTGGAAGCAGGAATGGATAC 
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CCTTATCATGCATCTCTGCGAAGAAACTGCCCCAGAGAATCAGAAGTTCATCTTGCAGGA 
GGATGGATCTTTATTTCACGAACAGTCCAAGAAATGTGTCCAGGCTGCGAGGAAGGAGTC 
GAGTGACAGTTTCGTTCCACTCTTACGAGACTGCACCAACTCGGATCATCAGAAATGGTT 
CTTCAAAGAGCGCATGTTATG AAGCCTCGTGTATCAAGGAGCCCCATCGAAGGAGACTGT 
GGAGCCAGGACTCTGCCCAACAAAGACTTAGCTAAGCAGTGACCAGAACCACAAAAACTA 
GGCTGGATTGCTTTTGCAAGAGGCAATCATTTGCCCTTTGTGAAAGTGTGTGGATTAGGT 
AACAGTGATAGCTGTACTATTTGGCACCTTC TAATGTT CAAATACCTATTTCCAGGTACT 
CAGATGGTACCCTGTTTTTGAATTAACCTTTAATTTTCTTCAAACGTATTTAACACGCGG 
CCTAACT TCTAGACAAGAAAGATCTTCGGGGGTCACAACCCCCGAAGAATTCGGCGGACC 
GTCCACCCTGCTACTAGTCACCCGCGGAGCCAACAACGCCAAGCGCTGCATCACACTCTA 
GCACGGCGGCCCACACGAACACATCAAGCAGAGGCAGATACCATAATAGT 



In a search of public sequence databases, the NOV52 nucleic acid sequence, located on 
chromsome has 951 of 1468 bases (64%) identical to a gb:GENBANK- 
ID:MMU73819|acc:U73819.1 mRNA from Mus musculus (Mus musculus polypeptide 
GalNAc transferase -T4 (ppGaNTase-T4) mRNA, complete cds). Public nucleotide databases 
include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV52 polypeptide (SEQ ID NO: 1 22) encoded by SEQ ID NO: 1 2 1 has 
581 amino acid residues and is presented in Table 52B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV52 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.8200. The most likely cleavage site 
for aNOV52 peptide is between amino acids 39 and 40. 



Table 52B. Encoded NOV52 protein sequence (SEQ ID NO:122). 

MWGRTARRRCPRELRRGREALLVLLALIJUjAGLGSVLRAQRGAGAGAAEPGPPRTPRPGR 
REPVMPRPPVPANALGARGEAVRLQLQGEELRLQEESVRLHQINIYLSDRISLHRRLPER 
WNPLCKEKKYDYDNLPRTSVIIAFYNEAWSTLLRTVYSVLETSPDILLEEVILVDDYSDR 
EHLKERLANELSGLPKVRLIRANKREGLVRARLLGASAARGDVLTFLDCHCECHEGWLEP 
LLQRIHEEESAWCPVIDVIDWNTFEYLGNSGEPQIGGFDWRLVFTWHTVPERERIRMQS 
PVDVIRSPTMAGGLFAVSKKYFEYLGSYDTGMEVWGGENLEFSFRIWQCGGVLETHPCSH 
VGHVFPKQAPYSRNKALANSVRAAEVWI^EFKELYYHRNPRARLEPFGDVTERKQLRDKL 
QCKDFKWFLETVYPELHVPEDRPGFFGMLQNKGLTDYCFDYNPPDENQIVGHQVILYLCH 
GMGQNQ FFE YTS QKE I RYNTHQ PEGC I AVE AGMDTL IMHLCEETAPENQKF I LQEDGS LF 
HEQSKKCVQAARKESSDSFVPLLRDCTNSDHQKWFFKERML 



A search of sequence databases reveals that the NOV52 amino acid sequence has 330 
of 566 amino acid residues (58%) identical to, and 408 of 566 amino acid residues (72%) 
similar to, the 578 amino acid residue ptnr:SPTREMBL-ACC:O08832 protein from Mus 
musculus (Mouse) (POLYPEPTIDE GALNAC TRANSFERASES). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV52 is expressed in at least bone marrow, colon, lung, ovary, kidney, respiratory 
bronchiole, stomach, testis, tonsils, and germ cells. Expression information was derived from 
the tissue sources of the sequences that were included in the derivation of the sequence of 
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CG57748-01 .The sequence is predicted to be expressed in the following tissues because of the 
expression pattern of (GENBANK-ID: gb:GENBANK-ID:MMU73819|acc:U73819.1) a 
closely related Mus musculus polypeptide GalNAc transferase-T4 (ppGaNTase-T4) mRNA, 
complete cds homolog in species Mus musculus :spleen. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV52 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table52C. 



Table 52C. BLAST results for NOV52 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|7657112|ref |NP 
056552 .1 | 
(NM_015737) 


UDP-N-acetyl- 
alpha-D- 

galactosamine : poly- 
peptide N- 
acetylgalactosamin 
yltransf erase 4; 
ppGaNTase-T4 (Mus 
musculus] 


578 


330/570 
(57%) 


408/570 
(70%) 


e-180 


gi|450390l|ref | NP 
003765. l| 
<NM_003774 


polypeptide N- 
acetylgalactosamin 
yltransf erase 4 
UDP-N-acetyl- 
alpha-D- 

galactosamine :poly 
peptide N- 
acetylgalactosamin 
yltransf erase 4; 
GalNAc transferase 
4; UDP -GalNAc: 
polypeptide N- 
acetylgalactosamin 
yltransf erase 4 


578 


324/570 
(56%) 


402/570 
(69%) 


e-179 


gi|l337588l|ref |NP 

078918 . 1 | 
<NM 024642) 


protein FLJ21212 
[Homo sapiens] 


284 


284/284 
(100%) 


284/284 
(100%) 


e-174 


gi | 1553 02 99 |gb|AAH 
13945 .1 [AAH13945 
(BC013945) 


Similar to hypothetical 
protein FLJ21212 [Homo 
sapiens] 


272 


272/272 
(100%) 


272/272 
(100%) 


e-166 


gi| 14530626 |emb|CA 
C42368.ll 
(AL110487) 


cDNA EST 
EMBL:AF031835 
comes from this gene 
[Caenorhabditis 
elegans] 


623 


248/529 
(46%) 


322/529 
(59%) 


e-128 



Tables 52D-E list the domain descriptions from DOMAIN analysis results of NOV52. 
This indicates that the NOV52 sequence has properties similar to those of other proteins 
known to contain this domain. 



295 



31^ 32a,:3J *J» 



i?3MS3» 



3T3i:Sa .^a IT?!? ..S3 

SB -'UT 0*^ 31^ ?Ufr 'Jfx^ 



Table 52D. Domain Analysis of NOV52 

gnl | Pfa m 1 pf am00535 , Glycos__transf_2 , Glycosyl transferase. Diverse 
family, transferring sugar from UDP-glucose, UDP-N-acetyl- 
galactosamine, GDP-mannose or CDP-abequose, to a range of substrates 
including cellulose, dolichol phosphate and teichoic acids. 

CD-Length = 168 residues, 100.0% aligned 

Score - 109 bits (273) , Expect = 4e-25 



Table 52E. Domain Analysis of NOV52 



gnl | Smart [ smar t00458 , RICIN, Ricin-type beta- trefoil ; Carbohydrate- 
binding domain formed from presumed gene triplication 

CD-Length = 125 residues, 97.6% aligned 
Score = 78.2 bits (191), Expect = le-15 



NOV52 has homology with uridine diphosphate (UDP)-GalNAc: polypeptide N- 
acetylgalactosaminyltransferase (GalNAc transferase), a member of the glycosyl transferase 
5 family. This enzyme catalyzes the initial step in mucin-type O-glycosylation of specific 
proteins. Glycosylation of cell surface proteins is critical to normal development, immune 
response and tissue functions, as evidenced by the phenotypes of a number of mouse knockout 
models (See Muramatsu; J Biochem (Tokyo) 2000 Feb; 127(2): 171 -6). Glycosylation patterns 
are known to change during the process of carcinogenesis (Kohsaki et al., J Gastroenterol 

1 0 2000;35(1 1):840-8). Alterations of these patterns by introducing a transgene coding for a 

GalNAc transferase (See Tsurifune et aL, Int J Oncol 2000 Jul;17(l):159-65)or by means of 
antisense oligonucleotides (See Zeng et al., Proc Natl Acad Sci USA 1995 Sep 
12;92(19):8670-4) alter cell morphology, growth and adhesion patterns. Therefore these 
proteins are important markers and therapeutic targets for oncology applications. In addition, a 

1 5 member of this family has been implicated in autosomal dominant hypophosphatemic rickets 
(See White et al., Gene 2000 Apr 4;246(l-2):347-56). 

Glycosyl transferases comprise a fairly diverse group of proteins that catalyze the 

addition of sugar from UDP-glucose, UDP-N-acetyl-galactosamine, GDP-mannose or CDP- 

abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids. 

20 The disclosed NOV52 nucleic acid of the invention encoding a N- 

acetylgalactosaminyltransferase-like protein includes the nucleic acid whose sequence is 

provided in Table 52A or a fragment thereof. The invention also includes a mutant or variant 

nucleic acid any of whose bases may be changed from the corresponding base shown in Table 

52A while still encoding a protein that maintains its N-acetylgalactosaminyltransferase-like 
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activities and physiological functions, or a fragment of such a nucleic acid. The invention 
further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
5 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

10 In the mutant or variant nucleic acids, and their complements, up to about 36 percent of the 
bases may be so changed. 

The disclosed NOV52 protein of the invention includes the N- 
acetylgalactosaminyltransferase-like protein whose sequence is provided in Table 52B. The 
invention also includes a mutant or variant protein any of whose residues may be changed 

1 5 from the corresponding residue shown in Table 52B while still encoding a protein that 

maintains its N-acetylgalactosaminyltransferase-like activities and physiological functions, or 
a functional fragment thereof. In the mutant or variant protein, up to about 42 percent of the 
residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

20 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this N- 
acetylgalactosaminyltransferase-like protein (NOV52) may function as a member of a "N- 
acetylgalactosaminyltransferase family". Therefore, the NOV52 nucleic acids and proteins 
identified here may be useful in potential therapeutic applications implicated in (but not 

25 limited to) various pathologies and disorders as indicated below. The potential therapeutic 
applications for this invention include, but are not limited to: protein therapeutic, small 
molecule drug target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), diagnostic and/or prognostic marker, gene therapy (gene delivery/gene ablation), 
research tools, tissue regeneration in vivo and in vitro of all tissues and cell types composing 

30 (but not limited to) those defined here. 

The NOV52 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the N- 
acetylgalactosaminyltransferase-like protein (NOV52) may be useful in gene therapy, and the 
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N-acetylgalactosaminyltransferase-like protein (NOV52) may be useful when administered to 
a subject in need thereof. By way of nonlimiting example, the compositions of the present 
invention will have efficacy for treatment of patients suffering from hemophilia, 
hypercoagulation, idiopathic thrombocytopenic purpura, immunodeficiencies, graft versus 
host disease, hypercoagulation, autoimmune disease, allergies,transplantation, Hirschsprung's 
disease , Crohn's Disease, appendicitis, systemic lupus erythematosus, autoimmune disease, 
asthma, emphysema, scleroderma, allergy, ARDS, endometriosis, fertility, diabetes, renal 
artery stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney disease, systemic 
lupus erythematosus, renal tubular acidosis, IgA nephropathy, hypercalceimia, Lesch-Nyhan 
syndrome, hypercalceimia, ulcers, fertility, hypogonadism, polycystic ovarian syndrome, 
cancer, tissue degeneration, bacterial/viral/parasitic infection, or other pathologies or 
conditions. The NOV52 nucleic acid encoding the N-acetylgalactosaminyltransferase-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV52 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV52 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV52 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV53 

A disclosed NOV53 nucleic acid of 2030 nucleotides (also referred to as CG57693-01) 
encoding a protein kinase-like protein is shown in Table 53 A. The start and stop codons are in 
bold letters. 



Table 53A. NOV53 nucleotide sequence (SEQ ID NO:123). 

TTTTTTTTTTTTTTGACAATCACCCAGTCAGTATTTATTAGGCGCCTACTGCGGACGGTT 
GGTCTTCACGCAGATCAGGCGAGAAGGGATAGTGATGCGCGGGCCGCTGCGGAGACCCGG 
AGCCCGCCGCGGATCACGGGAATTTCGCGCCTATTTTTTGTTGGCTGGGTGTTCTCGCCA 
GTGATTGGGTCCCGGCAAGGCGTGCCGGTGCGCTCGGCTGCGGCTGCTCCGTCGACCTTT 
GCAGGACCGGGCGGTGCAGGGCTCACTCGGCTGGCGTCCCGGGGG ATGGGCCACCAGGAG 
TCTCCGCTGGCCCGGGCGCCGGCGGGAGGTGCAGCTTATGTAAAGAGGTTATGTAAAGGG 
CTCAGCTGGCGCGAACACGTGGAAAGCCACGGGAGCCTAGGAGCCCAGGCTTCCCCAGCG 



298 



a*21u ?X«J1 ^.^^M^ "IP 21uu, xi^}' W JU«*. 



AGCGCCGCGGCAGCAGAAGGATCCGCTACACGCCGGGCTCGGGCCGCCACCTCCCGCGCT 
GCTCGGTCCCGGAGGCAGCCCGGGCCCGGAGCGGACCATCCCCAGGCAGGGGCTCCAGGG 
GGGAAACGGGCCGCCCGGAAGTGGAGGTGCGCGGGCCAGGTCACAATCCAAGGTCCGGCT 
CCTCCGCGTCCCAGGGCCGGACGGAGGGATGAGGCAGGGGGGGCCCGGGCAGCGCCGTTG 
CTGCTCCCCCCGCCGCCCGCAGCCATGGAAACGGGGAAGGACGGCGCCCGCAGAGGTACA 
CAAAGCCCGGAGCGGAAAAGGCGAAGCCCAGTGCCGCGGGCGCCCAGCACGAAGCTGAGG 
CCGGCGGCGGCGGCCCGGGCCATGGATCCGGTGGCGGCCGAGGCCCCGGGCGAGGCCTTC 
CTGGCGCGGCGACGGCCTGAGGGCGGTGGCGGGTCCGCGCGGCCGCGTTACAGCCTGTTG 
GCGGAGATCGGGCGCGGCAGCTACGGCGTGGTTTATGAGGCAGTGGCCGGGCGCAGCGGG 
GCCCGGGTGGCGGTCAAGAAGATCCGCTGCGACGCCCCCGAGAACGTGGAGCTGGCGCTG 
GCTGAATTCTGGGCCCTCACCAGCCTCAAGCGGCGCCACCAGAACGTCGTGCAGTTTGAG 
GAGTGCGTCCTGCAGCGCAATGGGTTAGCCCAGCGCATGAGTCACGGCAACAAGAGCTCG 
CAGCTTTACCTGCGCCTGGTGGAGACCTCGCTGAAAGAAAGGATCCTGGGTTATGCTGAG 
GAGCCCTGCTATCTCTGGTTTGTCATGGAGTTCTGTGAAGGTGGAGACCTGAATCAGTAT 
GTCCTGTCCCGGAGGCCAGACCCAGCCACCAACAAAAGTTTCATGCTACAGCTGACGAGC 
GCCATTGCCTTCCTGCACAAAAACCATATTGTGCACAGGGACCTGAAGCCAGACAACATC 
CTCATCACAGAGCGGTCTGGCACCCCCATCCTCAAAGTGGCCGACTTTGGACTAAGCAAG 
GTCTGTGCTGGGCTGGCACCCCGAGGCAAAGAGGGCAATCAAGACAACAAAAATGTGAAT 
GTGAATAAGTACTGGCTGTCCTCAGCCTGCGGTTCGGACTTCTACATGGCTCCTGAAGTC 
TGGGAGGGACACTACACAGCCAAGGCGGACATCTTTGCCCTGGGCATTATCATCTGGGCA 
ATGATAGAAAGAAT CACTTTTATTGACTCTGAGAC C AAGAAGGAGCTC CTGGGGACCTAC 
ATTAAACAGGGGACTGAGATCGTCCCTGTTGGTGAGGCGCTGCTAGAAAACCCAAAGATG 
GAGTTGCACATCCCCCAAAAACGCAGGACTTCCATGTCTGAGGGGATCAAGCAGCTCTTG 
AAAGATATGTTAGCTGCTAACCCACAGGACCGGCCTGATGCCTTTGAACTTGAAACCAGA 
ATGGACCAGGTCACATGTGCTGCTTA AAATTCAGGGCTAAGCATTTTGGGTGA TTTTAAA 
CTAGGTCG ATTCCTCGGGACCCACAGTCTCACCACGT CTCCTCCAGAGGACGGCAGAGGG 
TACAGGTGGTGGCCTGGCCGGTTGGCGATCTCCCGACAGCTGGATCCGGC 



In a search of public sequence databases, the NOV53 nucleic acid sequence, located on 
chromsome 20 has 262 of 361 bases (72%) identical to a gb:GENBANK- 
ID:AB041802|acc:AB041802.1 mRNA from Mus musculus (Mus musculus brain cDNA, 
clone MNCb-1723). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV53 polypeptide (SEQ ID NO: 124) encoded by SEQ ID NO: 123 
has 533 amino acid residues and is presented in Table 53B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV53 has no signal peptide and 
is likely to be localized to the cytoplasm with a certainty of 0.8500. 



Table 53B. Encoded NOV53 protein sequence (SEQ ID NO: 124). 



MGHQE S PLARAPAGG AA.YVKRLCKGLS WREHVE SHGS LGAQAS PAS AAAAEGS ATRRARA 
ATSRAARSRRQPGPGADHPQAGAPGGKRAARKWRCAGQVTIQGPAPPRPRAGRRDEAGGA 
RAAPLLLPPPPAAMETGKDGARRGTQSPERKRRSPVPRAPSTKLRPAAAARAMDPVAAEA 
PGEAFLARRRPEGGGGSARPRYSLLAEIGRGSYGWYEAVAGRSGARVAVKKIRCDAPEN 
VELALAEFWALTSLKRRHQNVVQFEECVLQRNGLAQRMSHGNKSSQLYLRLVETSLKERI 
LGYAEEPCYLWFVMEFCEGGDLNQYVLSRRPDPATNKS FMLQLTSAIAFLHKNHIVHRDL 
KPDNILITERSGTPILKVADFGLSKVCAGLAPRGKEGNQDNKNVNVNKYWLSSACGSDFY 
MAPEVWEGHYTAKADIFALGIIIWAMIERITFIDSETKKELLGTYIKQGTEIVPVGEALL 
ENPKMELHI PQKRRTSMS EG I KQLLKDMLAANPQDRPDAFELETRMDQVTCAA 



A search of sequence databases reveals that the NOV53 amino acid sequence has 517 
of 517 amino acid residues (100%) identical to, and 517 of 517 amino acid residues (100%) 
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similar to, the 517 amino acid residue ptnr:TREMBLNEW-ACC:CAC 105 18 protein from 
Homo sapiens (Human) (BA550O8.2 (NOVEL PROTEIN KINASE)). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV53 is expressed in at least Adrenal Gland/Suprarenal gland, Lymphoid tissue, 
Oviduct/Uterine Tube/Fallopian tube, Peripheral Blood, Placenta, Retina, Thymus. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV53 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 53C . 



Table 53C. BLAST results for NOV53 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi| 12 83 033 5 | emb | CAC 


bA550O8.2 (novel 
protein kinase) 
[Homo sapiens] 


517 


517/517 
(100%) 


517/517 
(100%) 


0.0 


10518 .2 | (AL359916) 


gi | 18592261 | ref |XP 


serine/ threonine 
kinase 3 5 [Homo 
sapiens] 


401 


400/401 
(99%) 


400/401 
(99%) 


0 . 0 


086681 . 1 | 


(XM_086681) 


gi | 168782 90 |gb|AAHl 


(protein for 
IMAGE:4869353) 
[Homo sapiens] 


398 


395/398 
(99%) 


395/398 
(99%) 


0 . 0 


7340 . 1 | AAH17340 


(BC017340) 


gi|l8549074|ref |XP 


similar to Cell 
division control 
protein 2 homolog 
(P34 protein 
kinase) [Homo 
sapiens] 


161 


101/156 
(64%) 


129/156 
(81%) 


e-56 


086530. l| 
(XM_086530) 


gi | 15224378 | ref |NP 


putative protein 
kinase 
[Arabidopsis 
thaliana] 


257 


92/314 
(29%) 


143/314 
(45%) 


e-23 


181320. 1 | 
(NM 129340) 



Tables 53D-F list the domain descriptions from DOMAIN analysis results against 
NOV53. This indicates that the NOV53 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 53D. Domain Analysis of NOV53 

gnl 1 Smart | smart00220 / S__TKc, Serine/Threonine protein kinases, 
catalytic domain; Phosphotransferases. Serine or threonine- specif ic 
kinase subfamily. 

CD-Length = 256 residues, 97.7% aligned 

Score = 193 bits (491), Expect = 2e-50 
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Table 53E. Domain Analysis of NOV53 

gnl|Pfam|pfam00069, pkinase, Protein kinase domain 

CD-Length = 256 residues, 97.7% aligned 

Score = 189 bits (481), Expect = 3e-49 
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Table 53F. Domain Analysis of NOV53 

gnl | Smart [ smart 0021 9 , TyrKc, Tyrosine kinase, catalytic domain; 
Phosphotransferases. Tyrosine- specif ic kinase subfamily 

CD-Length = 258 residues, 99.6% aligned 

Score = 121 bits (303), Expect = le-28 

Protein phosphorylation is a fundamental process for the regulation of cellular 
functions. The coordinated action of both protein kinases and phosphatases controls the levels 
of phosphorylation and, hence, the activity of specific target proteins. One of the predominant 
roles of protein phosphorylation is in signal transduction, where extracellular signals are 
amplified and propagated by a cascade of protein phosphorylation and dephosphorylation 
events. Eukaryotic protein kinases are enzymes that belong to a very extensive family of 
proteins which share a conserved catalytic core common with both serine/threonine and 
tyrosine protein kinases. There are a number of conserved regions in the catalytic domain of 
protein kinases. In the N-terminal extremity of the catalytic domain there is a glycine-rich 
stretch of residues in the vicinity of a lysine residue, which has been shown to be involved in 
ATP binding. In the central part of the catalytic domain there is a conserved aspartic acid 
residue which is important for the catalytic activity of the enzyme. Protein kinases are 
excellent small molecule drug targets for therapeutic intervention. 

The disclosed NOV53 nucleic acid of the invention encoding a protein kinase-like 
protein includes the nucleic acid whose sequence is provided in Table 53A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 53 A while still encoding a protein 
that maintains its protein kinase-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 28 percent of the bases may be so changed. 
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The disclosed NOV53 protein of the invention includes the protein kinase-like protein 
whose sequence is provided in Table 53B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
53B while still encoding a protein that maintains its protein kinase-like activities and 
5 physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 0 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this protein kinase-like 
10 protein (NOV53) may function as a member of a "protein kinase family". Therefore, the 
NOV53 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
15 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV53 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
20 and disorders as indicated below. For example, a cDNA encoding the protein kinase-like 

protein (NOV53) may be useful in gene therapy, and the protein kinase-like protein (NOV53) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 
example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from adrenoleukodystrophy , congenital adrenal hyperplasia, anemia, ataxia- 
25 telangiectasia, autoimmune disease, fertility, hemophilia, hypercoagulation, idiopathic 
thrombocytopenic purpura, graft versus host disease, allergies, immunodeficiencies, 
transplantation, graft versus host disease (GVHD), lymphaedema, Von Hippel-Lindau (VHL) 
syndrome, diabetes, tuberous sclerosis, or other pathologies or conditions. The NOV53 nucleic 
acid encoding the protein kinase-like protein of the invention, or fragments thereof, may 
30 further be useful in diagnostic applications, wherein the presence or amount of the nucleic acid 
or the protein are to be assessed. 

NOV53 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV53 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
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known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV53 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV54 

A disclosed NOV54 nucleic acid of 3331 nucleotides (also referred to as CG57707- 
01) encoding a Leucine-rich glioma-inactivated protein precursor -like protein is shown in 
Table 54A. The start and stop codons are in bold letters. 



Table 54 A. NOV54 nucleotide sequence (SEQ ID NO: 125). 

ATGGCGCTGCGGAGAGGCGGCTGCGGAGCGCTCGGGCTGCTGCTGCTGCTGCTGGGCGCC 
GCGTGCCTGATACCGCGGAGCGCGCAGGTGAGGCGGCTGGCGCGCTGCCCCGCCACTTGC 
AGCTGTACCAAGGAGTCTATCATCTGCGTGGGCTCTTCCTGGGTGCCCAGGATCGTGCCG 
GGCGACATCAGCTCCCTGAGCCTGGTAAATGGGACGTTTTCAGAAATCAAGGACCGAATG 
TTTTCCCATCTGCCTTCTCTGCAGCTGCTATTGCTGAATTCTAACTCATTCACGATCATC 
CGGGATGATGCTTTTGCTGGACTTTTTCATCTTGAATACCTGTTCATTGAAGGGAACAAA 
ATAGAAACCATTTCAAGAAATGCCTTTCGTGGCCTCCGTGACCTGACTCACCTTTCTTTG 
GCCAATAACCACATAAAAGCACTACCAAGGGATGTCTTCAGTGATTTAGACTCTCTGATT 
GAACAGATTTTGAGGGGTAATAAATTTGAATGTGACTGCAAAGCCAAGTGGCTATACCTG 
TGGTTGAAGATGACAAATTCCACCGTTTCTGATGTGCTGTGTATTGGTCCACCAGAGTAT 
CAGGAAAAGAAGCTAAATGACGTGACCAGCTTTGACTATGAATGCACAACTACAGATTTT 
GTTGTTCATCAGACTTTACCCTACCAGTCGGTTTCAGTGGATACGTTCAACTCCAAGAAC 
GATGTGTACGTGGCCATCGCGCAGCCCAGCATGGAGAACTGCATGGTGCTGGAGTGGGAC 
CACATTGAAATGAATTTCCGGAGCTATGACAACATTACAGGTCAGTCCATCGTGGGCTGT 
AAGGCCATTCTCATCGATGATCAGGTCTTTGTGGTGGTAGCCCAGCTCTTCGGTGGCTCT 
CACATTTACAAATACGACGAGAGTTGGACCAAATTTGTCAAATTCCAAGACATAGAGGTC 
TCTCGCATTTCCAAGCCCAATGACATCGAGCTGTTTCAGATCGACGACGAGACGTTCTTT 
GTCATCGCAGACAGCTCAAAGGCTGGTCTGTCCACAGTTTATAAATGGAACAGCAAAGGA 
TTCTATTCTTACCAGTCACTGCACGAGTGGTTCAGGGACACGGATGCGGAGTTTGTTGAT 
ATCGATGGAAAATCGCATCTCATCCTGTCCAGCCGCTCCCAGGTCCCCATCATCCTCCAG 
TGGAATAAAAGCTCTAAGAAGTTTGTCCCCCATGGTGACATCCCCAACATGGAGGACGTA 
CTGGCTGTGAAGAGCTTCCGAATGCAAAATACCCTCTACCTTTCCCTTACCCGCTTCATC 
GGGGACTCCCGGGTCATGAGGTGGAACAGTAAGCAGTTTGTGGAGATCCAAGCTCTTCCA 
TCCCGGGGGGCCATGACCCTGCAGCCCTTTTCTTTTAAAGATAATCACTACCTGGCCCTG 
GGGAGTGACTATACATTCTCTCAGATATACCAGTGGGATAAAGAGAAGCAGCTATTCAAA 
AAGTTTAAGGAGATTTATGTGCAGGCGCCTCGTTCATTCACAGCTGTCTCCACCGACAGG 
AGAGATTTCTTTTTTGCATCCAGTTTCAAAGGGAAAACAAAGATTTTTGAACATATAATT 
GTTGACTTAAGTTTGTG AAGGTGTGGTGGGTGAAACTAAGAGAAATGTAGCATTAGCTCT 
CACAAAAGAGGACCAA GAAAAATCAACAAAC AAATCAAAGCCAGGCTCAGAGCTCTGAAA 
TTAAAAAGCACTGAAATAGTTAGATGTTTTCAAACTTTTAGAACTCACATTTTAATCAGG 
GATTGCATTTATTGG CTAACTG CATGACATGCCCATTCTACCATTTTAAAAAAAATCTTA 
AAGCCTGTAATTTCTGAGAAAA GAGTACAGCATTT ACTCTTATCATCTAGAAATGTAATA 
TGCTTCCCCCCCGCTTTTTGATGAGGAAGA AGACAAT TGGATAAGATGGGACAGCACTTA 
TAATGAAATAAAAAAAAACTTTG AGCCCC TCTCATTCCACTTTAGCAATCTTTTTGGTAA 
GAACTCTTAAAGCCAAAAGTCTGCTGAAAAGATTTGCTGATTATTAGTTTAAA AATCTTG 
TAACACTCAGCAGTGCTATTTTGAGTCATCCCAGTTTCCTGAAAGTAATGCCCAGTCTTC 
CTGAATCCTCCTTAATAGCAGAACCTTGGTGATTTTGTTGGCTCATATGAATGCTTGTCA 
TGGATATGTTAACAATTTAGTGTTTGACATTGCTTCCTCTGCCACAAAGACAATACTCTG 
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GTGACACATGTCTAGGCCCAGCACAGGCTGTAGGCCCAGGAGTGACTCAAAGGAGTTTTT 
CCCTCTTTCTTACGGTTCAAAGGTGACCCTGGTGGTGGCCAGAGCAGTAATGCTTGTTTG 
ATGCTCTTCATGGCTCATCTGCTTCTCAGAACCCACCCGTTGAGTTTGTGGGTAACCAGC 
AGGCAGGCCAAAGACTGGTGCTTTTCATTTCATCCTTTAGAGGGATGAAACAGTTATTTC 
CGTCTGATGAGCATTCGGTAGAATTTTTGAAGTGAGATTTTATGAAGTCAAAGGGGACTT 
TACACAGATCTCGACCTGCTTTGAAACCTAGAGGTGGCCCTTTGATTTGTGCGTGTCCTT 
GCCCTCTGGACAACTTAATATTTCAAGTAATCGAATACCAACTTCCCTGC CAGCCCACCT 
GCCTTCCGCCCCGCTTGTGTAACAGTCCTGTTTTGTT GAGTTGC TGCTATTGCACTGCCA 
GTGCAGCCCACACCAAATCACAACCCAAGATACTCAGATAGGAAGACTCCTTCCTCTCCC 
AGTACTTTACCAAAGGAACCCCCGCCAGGACCCACAT GGGGCCA CGTGTTGGCAGTGGAA 
TCAGCCTGTGCAGGCTGGGGATCTCAGGCTGAT CAGTAGG GGCCAGCTTTGGAGCCAGCC 
AAGCTGAATCCCACACTCCAGGTCTGTGCTCAAGAGA CCAGATG GTGTATTTCCAAATGG 
GCCTCTCTGGTATGGGCAATAGGCAAGCTCCTGGGGTCTGGTTATGTGGAAGATTCTTAG 
TGGATGTTCCGCCTGGTTAGCTGGTTCTCTTCAGAG AATATA AAGTGAATGCCTTTAGGG 
GTAGCTCTGAAAGAGAAACCCAACAACTTCATTCCTAGCCATGAAAGTAGCACGATCATA 
TTGTACTGTATTGTTATTGTAAAATGACTATTTGCCA TGTCATG AGTAGGTAGATGTTTT 
GCCACAAATATGAATGTGTTTGTTGTTTCCTGACTTTAAGCAATGAAGATTGAGACAATA 
AATAGC ACT C AGAGAATGAAGC ATTGATGTT 



10 



In a search of public sequence databases, the NOV54 nucleic acid sequence, located on 
chromsome 4pl6 has 682 of 1052 bases (64%) identical to a gb:GENBANK- 
ID:AF055636|acc:AF055636.1 mRNA from Homo sapiens (Homo sapiens leucine-rich 
glioma-inactivated protein precursor (LGI1) mRNA, complete cds). Public nucleotide 
databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV54 polypeptide (SEQ ID NO:126) encoded by SEQ ID NO:125 
has 545 amino acid residues and is presented in Table 54B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV54 has a signal peptide and 
is likely to be localized localized extracellularly with a certainty of 0.8200. The most likely 
cleavage site for aNOV54 peptide is between amino acids 22 and 23. 



Table 54B. Encoded NOV54 protein sequence (SEQ ID NO:126). 



MALRRGGCGALGLLLLLLGAAC L I PRS AQVRRLARC PAT C S CTKE S 1 1 CVGS S WVPR I VP 
GDISSLSLVNGTFSEIKDRMFSHLPSLQLLLLNSNSFTIIRDDAFAGLFHLEYLFIEGNK 
IETISRNAFRGLRDLTHLSLANNHIKALPRDVFSDLDSLIEQILRGNKFECDCKAKWLYL 
WLICMTNSTVSDVLCIGPPEYQEKKLNDVTSFDYECTTTDFWHQTLPYQSVSVDTFNSKN 
DVYVAIAQPSMENCMVLEWDHIEMNFRSYDNITGQSIVGCKAILIDDQVFVWAQLFGGS 
HIYKYDESWTKFVKFQDIEVSRISKPNDIELFQIDDETFFVIADSSKAGLSTVYKWNSKG 
F YS YQ S LHE WFRDTDAE FVD I DGKSHL I LS SRS QVP 1 1 LQWNKS S KKFVPHGD I PNMED V 
LAVKS FRMQNTL YL S LTR F I GD S R VMR WNS KQFVE I QAL P S RG AMTLQ P FS FKDNH YLAL 
GSDYTFSQI YQWDKEKQLFKKFKE I YVQAPRS FTAVSTDRRDFFFAS SFKGKTKI FEHI I 
VDLSL 



A search of sequence databases reveals that the NOV54 amino acid sequence has 301 
of 538 amino acid residues (55%) identical to, and 386 of 538 amino acid residues (71%) 
1 5 similar to, the 557 amino acid residue ptnr:SPTREMBL-ACC:O95970 protein from Homo 
sapiens (Human) (LEUCINE-RICH GLIOMA-INACTIVATED PROTEIN PRECURSOR). 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 
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NOV54 is expressed in at least Appendix, brain, colon, heart, kidney, ovary, pancreas, 
parathyroid gland, uterus, and vein. This information was derived by determining the tissue 
sources of the sequences that were included in the invention including but not limited to 
SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV54 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 54C 



Table 54C. BLAST results for NOV54 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l562089l|dbj | BAB 


KIAA1916 protein 
[Homo sapiens] 


542 


539/542 
(99%) 


539/542 
(99%) 


0.0 


67809. l| (AB067503) 


gi | 9938002 | ref |NP 0 


leucine-rich, 
glioma 

inactivated 1 
[Mus musculus] 


557 


296/516 
(57%) 


378/516 
(72%) 


e-178 


64674. l| 
(NM_020278) 


gi|4826816|ref | NP 0 


leucine-rich, 
glioma 

inactivated 1 
precursor [Homo 
sapiens] 


557 


296/516 
(57%) 


379/516 
(73%) 


e-178 


05088. lj 
(NM_005 097) 


gi| 15722102 |emb|CAC 


bA512J3 . 1 
(leucine -rich, 
glioma 

inactivated 1) 
[Homo sapiens] 


461 


269/460 
(58%) 


342/460 
(73%) 


e-160 


78757. 1| (AL358154 


gi|l8591028|ref |XP 


protein XP_092O48 
[Homo sapiens] 


466 


179/437 
(40%) 


267/437 
(60%) 


3e-93 


092048. l| 
<XM 092048) 



Table 54D lists the domain descriptions from DOMAIN analysis results against 
NOV54. This indicates that the NOV54 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 54D. Domain Analysis of NOV54 

gnl | Pfamlpfam01463 , LRRCT, Leucine rich repeat C- terminal domain. 
Leucine Rich Repeats pfam00560 are short sequence motifs present in a 
number of proteins with diverse functions and cellular locations. 
Leucine Rich Repeats are often flanked by cysteine rich domains. This 
domain is often found at the C- terminus of tandem leucine rich 
repeats . 

CD-Length = 51 residues, 98.0% aligned, 

Score = 43.1 bits (100), Expect = 4e-05 



Loss of heterozygosity for 10q23-26 is seen in over 80% of glioblastoma multiforme 
tumors. Positional cloning was used to isolate the LGI 1 (Leucine-rich gene-Glioma 
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Inactivated) gene, which is rearranged as a result of the t(10;19)(q24;ql3) balanced 
translocation in the T98G glioblastoma cell line lacking any normal chromosome 1 0 (See 
Chernova et al., Oncogene 17: 2873-2881). Rearrangement of the LGI1 gene was also 
detected in the A 172 glioblastoma cell line and several glioblastoma tumors. These 
5 rearrangements lead to a complete absence of LGI1 expression in glioblastoma cells. The 
LGI1 gene encodes a protein with a calculated molecular mass of 60 kD and contains 3.5 
leucine-rich repeats (LRR) with conserved flanking sequences. In the LRR domain, LGI1 has 
the highest homology with a number of transmembrane and extracellular proteins which 
function as receptors and adhesion proteins. LGI1 is predominantly expressed in neural 
10 tissues, especially in brain; its expression is reduced in low grade brain tumors and it is 

significantly reduced or absent in malignant gliomas. Its localization to the 10q24 region, and 
rearrangements or inactivation in malignant brain tumors, suggest that LGI1 is a candidate 
tumor suppressor gene involved in progression of glial tumors. 

The human leucine-rich glioma-inactivated protein precursor-like protein described in 
1 5 this invention is predicted to share the attributes of other family members and is thus 

implicated in regulation of cell growth and survival as well as cellular metabolism. Like the 
LGI1 gene, the leucine-rich glioma-inactivated protein precursor-like gene described in this 
patent is expressed in neural tissues; however, it also appears to be frequently expressed in 
parathyroid tumors. Therefore, this protein is an attractive target for drug intervention in the 
20 treatment of cancer, central nervous system disorders, and metabolic diseases, among others. 
The leucine-rich glioma-inactivated protein precursor-like gene maps to human chromosome 
4pl6 and is predicted to encode a secreted protein. 

The disclosed NOV54 nucleic acid of the invention encoding a Leucine-rich glioma- 
inactivated protein precursor-like protein includes the nucleic acid whose sequence is provided 

25 in Table 54A or a fragment thereof. The invention also includes a mutant or variant nucleic 
acid any of whose bases may be changed from the corresponding base shown in Table 54A 
while still encoding a protein that maintains its Leucine-rich glioma-inactivated protein 
precursor-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 

30 described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
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in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 36 percent of the 
bases may be so changed. 

The disclosed NOV54 protein of the invention includes the Leucine-rich glioma- 
inactivated protein precursor-like protein whose sequence is provided in Table 54B. The 
invention also includes a mutant or variant protein any of whose residues may be changed 
from the corresponding residue shown in Table 54B while still encoding a protein that 
maintains its Leucine-rich glioma-inactivated protein precursor-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 45 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2 ; that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this Leucine-rich 
glioma-inactivated protein precursor-like protein (NOV54) may function as a member of a 
"Leucine-rich glioma-inactivated protein precursor family". Therefore, the NOV54 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV54 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the Leucine-rich glioma- 
inactivated protein precursor-like protein (NOV54) may be useful in gene therapy, and the 
Leucine-rich glioma-inactivated protein precursor-like protein (NOV54) may be useful when 
administered to a subject in need thereof. By way of nonlimiting example, the compositions 
of the present invention will have efficacy for treatment of patients suffering from cancer, 
trauma, bacterial and viral infections, in vitro and in vivo regeneration, Von Hippel-Lindau 
(VHL) syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's 
disease, Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple 
sclerosis, ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, 



**H« iWlf M *A 



308 



-a mi jr» £Ts- o rn- sa sss 

Huji luJ 1 ^^^R in "3» J 1 a«w su.. il«u 



pain, neurodegeneration, anemia , bleeding disorders, scleroderma, transplantation, 
hyperparathyroidism, hypoparathyroidism, diabetes, autoimmune disease, renal artery 
stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney disease, systemic lupus 
erythematosus, renal tubular acidosis, IgA nephropathy, Lesch-Nyhan syndrome, 
Hirschsprung's disease , Crohn's Disease, appendicitis, endometriosis, fertility, 
cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, aortic stenosis, atrial 
septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary 
stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, obesity, 
transplantation, and pancreatitis, or other pathologies or conditions. The NOV54 nucleic acid 
encoding the Leucine-rich glioma-inactivated protein precursor-like protein of the invention, 
or fragments thereof, may further be useful in diagnostic applications, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. 

NOV54 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV54 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV54 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV55 

A disclosed NOV55 nucleic acid of 2886 nucleotides (also referred to as CG57306-01) 
encoding an anion exchanger-like protein is shown in Table 5 5 A. The start and stop codons 
are in bold letters. 



Table 55A. NOV55 nucleotide sequence (SEQ ID NO:127). 



ATTCTGTGCAAGCCTC ATGGAAATGAAGCTGCCAGGCCAGGAAGGGTTTGAAGCCTCCAG 
TGCTCCTAGAAATATTCCTTCAGGGGAGCTGGACAGCAACCCTGACCCTGGCACCGGCCC 
CAGCCCTGATGGCCCCTCAGACACAGAGAGCAAGGAACTGGGAGTACCCAAAGACCCTCT 
GCTCTTCATTCAGCTGAATGAGCTGCTGGGCTGGCCCCAGGCGCTGGAGTGGAGAGAGAC 
AGGCCGATGGGTACTGTTTGAGGAGAAGTTGGAGGTGGCTGCAGGCCGGTGGAGTGCCCC 
CCACGTGCCCACCCTGGCACTGCCCAGCCTCCAGAAGCTCCGCAGCCTGCTGGCCGAGGG 
CCTTGTACTGCTGGACTGCCCAGCTCAGAGCCTCCTGGAGCTCGTGGAGCAGGTGACCAG 
GGTGGAGTCGCTGAGCCCAGAGCTGAGAGGGCAGTTGCAGGCCTTGCTGCTGCAGAGACC 
CCAGCATTACAACCAGACCACAGGCACCAGGCCCTGCTGGGGTGAGAGCCCCTCCAGAAA 
GGCTTCTGACAATGAGGAAGCCCCCCTGAGGGACCAGTGTCAGAACCCCCTGAGACAGAA 
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GCTACCTCCAGGAGCTGAGGCAGGGACTGTGCTGGCAGGGGAGCTGGGCTTCCTGGCACA 
GCCACTGGGAGCCTTTGTTCGACTGCGGAACCCTGTGGTACTGGGGTCCCTTACTGAGGT 
GTCCCTCCCAAGCAGGTTTTTCTGCCTTCTCCTGGGCCCCTGTATGCTGGGAAAGGGCTA 
CCATGAGATGGGACGGGCAGCAGCTGTCCTCCTCAGTGACCCGCATTCCCAGCAATTCCA 
GTGGTCAGTTCGTCGGGCCAGCAACCTTCATGACCTTCTGGCAGCCCTGGATGCATTCCT 
AGAGGAGGTGACAGTGCTTCCCCCAGGTCGGTGGGACCCAACAGCCCGGATTCCCCCGCC 
CAAATGTCTGCCATCTCAGCACAAAAGGACCTCGGCTGAGGACAGGCACCGCCATGGGCC 
ACACGCACACAGCCCGGAGTTGCAGCGGACCGGCAGGCTGTTTGGGGGCCTTATCCAGGA 
CGTGCGCAGGAAGGTCCCGTGGTACCCCAGCGATTTCTTGGACGCCCTGCATCTCCAGTG 
CTTCTCGGCCGTACTCTACATTTACCTGGCCACTGTCACTAATGCCATCACTTTTGGGGG 
TCTGCTGGGAGATGCCACTGATGGTGCCCAGGGAGTGCTGGAAAGTTTCCTGGGCACAGC 
AGTGGCTGGAGCTGCCTTCTGCCTGATGGCAGGCCAGCCCCTCACCATTCTGAGCAGCAC 
GGGGC CAGTGCTGGTCTTTGAGCGCCTGCTCTTCTCTTTCAGCAGAGATTACAGCCTGGA 
CTACCTGCCCTTCCGCCTATGGGTGGGCATCTGGGTGGCTACCTTTTGCCTGGTGCTGGT 
GGCCACAGAGGCCAGTGTGCTGGTGCGCTACTTCACCCGCTTCACTGAGGAAGGTTTCTG 
TGCCCTCATCAGCCTCATCTTCATCTACGATGCTGTGGGCAAAATGCTGAACTTGACCCA 
TACCTATCCTATCCAGAAGCCTGGGTCCTCTGCCTACGGGTGCCTCTGCCAATACCCAGG 
CCCAGGAGGTAATGAGTCTCAATGGATAAGGACAAGGCCAAAAGACAGAGACGACATTGT 
AAGCATGGACTTAGGCCTGATCAATGCATCCTTGCTGCCGCCACCTGAGTGCACCCGGCA 
GGGAGGCCACCCTCGTGGCCCTGGCTGTCATACAGTCCCAGACATTGCCTTCTTCTCCCT 
TCTCCTCTTCCTTACTTCTTTCTTCTTTGCTATGGCCCTCAAGTGTGTAAAGACCAGCCG 
CTTCTTCCCCTCTGTGGTGCGCAAAGGGCTCAGCGACTTCTCCTCAGTCCTGGCCATCCT 
GCTCGGCTGTGGCCTTGATGCTTTCCTGGGCCTAGCCACACCAAAGCTCATGGTACCCAG 
AGAGTTCAAGCCCACACTCCCTGGGCGTGGCTGGCTGGTGTCACCTTTTGGAGCCAACCC 
CTGGTGGTGGAGTGTGGCAGCTGCCCTGCCTGCCCTGCTGCTGTCTATCCTCATCTTCAT 
GGACCAACAGATCACAGCAGTCATCCTCAACCGCATGGAATACAGACTGCAGAAGGGAGC 
TGGCTTCCACCTGGACCTCTTCTGTGTGGCTGTGCTGATGCTACTCACATCAGCGCTTGG 
ACTGCCTTGGTATGTCTCAGCCACTGTCATCTCCCTGGCTCACATGGACAGTCTTCGGAG 
AGAGAGCAGAGCCTGTGCCCCCGGGGAGCGCCCCAACTTCCTGGGTATCAGGGAACAGAG 
GCTGACAGGCCTGGTGGTGTTCATCCTTACAGGAGCCTCCATCTTCCTGGCACCTGTGCT 
CAAGTTCATTCCAATGCCTGTGCTCTATGGCATCTTCCTGTATATGGGGGTGGCAGCGCT 
CAGCAGCATTCAGTTCACTAATAGGGTGAAGCTGTTGTTGATGCCAGCAAAACACCAGCC 
AGACCTGCTACTCTTGCGGCATGTGCCTCTGACCAGGGTCCACCTCTTCACAGCCATCCA 
GCTTGCCTGTCTGGGGCTGCTTTGGATAATCAAGTCTACCCCTGCAGCCATCATCTTCCC 
CCTCATGTTGCTGGGCCTTGTGGGGGTCCGAAAGGCCCTGGAGAGGGTCTTCTCACCACA 
GGAACTCCTCTGGCTGGATGAGCTGATGCCAGAGGAGGAGAGAAGCATCCCTGAGAAGGG 
GCTGGAGCCAGAACACTCATTCAGTGGAAGTGACAGTGAAGATTCAGAGCTGATGTATCA 
GCCAAAGGCTCCAGAAATCAACATTTCTGTGAATTA GCTGGAGTAGGAGTCTGG GAGTGG 
AGACCC 



In a search of public sequence databases, the NOV nucleic acid sequence, located on 
chromsome 17 has 2250 of 2788 bases (80%) identical to a gb:GENBANK- 
ID:AB038264|acc:AB038264.1 mRNA from Oryctolagus cuniculus (Oryctolagus cuniculus 
AE4b mRNA for anion exchanger 4b, complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV55 polypeptide (SEQ ID NO: 128) encoded by SEQ ID NO: 127 
has 946 amino acid residues and is presented in Table 55B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV55 has no signal peptide and 
is likely to be localized extracellularly with a certainty of 0.8000. 



Table 55B. Encoded NOV55 protein sequence (SEQ ID NO: 128). 

MEMKLPGQEGFEASSAPRNIPSGELPSNPPPGTGPSPDGPSDTESKELGVPKDPLLFIQL ~~ 



sr% ?zb ies. -m rsa 
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NELLGWPQALEWRETGRWVLFEEKLEVAAGRWSAPHVPTLALPSLQKLRSLLAEGLVLLD 
CPAQSLLELVEQVTRVESLSPELRGQLQALLLQRPQHYNQTTGTRPCWGESPSRKASDNE 
EAPLRDQCQNPLRQKLPPGAEAGTVLAGELGFLAQPLGAFVRLRNPWLGSLTEVSLPSR 
FFCLLLGPCMLGKGYHEMGRAAAVLLSDPHSQQFQWSVRRASNLHDLLAALDAFLEEVTV 
LPPGRWDPTARIPPPKCLPSQHKRTSAEDRHRHGPHAHSPELQRTGRLFGGLIQDVRRKV 
PWYPSDFLDALHLQCFSAVLYIYLATVTNAITFGGLLGDATDGAQGVLESFLGTAVAGAA 
FCLMAGQPLTILSSTGPVLVFERLLFSFSRDYSLDYLPFRLWGIWVATFCLVLVATEAS 
VLVRYFTRFTEEGFCALISLIFIYDAVGKMLNLTHTYPIQKPGSSAYGCLCQYPGPGGNE 
SQWIRTRPKDRDDIVSMDLGLINASLLPPPECTRQGGHPRGPGCHTVPDIAFFSLLLFLT 
S FFFAMALKCVKTSRFFPSVVRKGLSDFSSVLAILLGCGLDAFLGLATPKLMVPREFKPT 
LPGRGWLVS PFGANPWWWS VAAALPALLLS ILI FMDQQI TAVI LNRMEYRLQKGAGFHLD 
LFCVAVLMLLTSALGLPWYVSATVISLAHMDSLRRESRACAPGERPNFLGIREQRLTGLV 
VF I LTGAS I FLAPVLKF I PMPVLYG I FL YMGVAALS S I QFTNR VKLLLMPAKHQPDLLLL 
RHVPLTRVHLFTAIQLACLGLLWIIKSTPAAIIFPLMLLGLVGVRKALERVFSPQELLWL 
DELMPEEERS I PEKGLEPEHS FSGSDS EDSELMYQPKAPE INI S VN 



A search of sequence databases reveals that the NOV55 amino acid sequence has 827 
of 944 amino acid residues (87%) identical to, and 873 of 944 amino acid residues (92%) 
similar to, the 939 amino acid residue ptnr:TREMBLNEW-ACC:BAB 18936 protein from 
5 Oryctolagus cuniculus (Rabbit) (ANION EXCHANGER 4B). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV55 is expressed in at least kidney, testis. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
10 sources. 

The disclosed NOV55 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 55C. 



Table 55C. BLAST results for NOV55 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 18560973 | ref | XP 
038736 .2 | 
(XM_038736) 


solute carrier 
family 4, sodium 
bicarbonate 
cot ransport er , 
member 9 [Homo 
sapiens] 


959 


938/962 
(97%) 


940/962 
(97%) 


0.0 


gi | 14582760 |qb|AAK6 
9625.l|AF332961 1 
(AF332961) 


anion exchanger 
AE4 [Homo sapiens 


959 


937/962 
(97%) 


939/962 
(97%) 


0.0 


gi| 7363254 |dbj | BAA9 
3010. 1| (AB032762) 


sodium 
bicarbonate 
cotransporter 5 
[Homo sapiens 


957 


936/960 
(97%) 


938/960 
(97%) 


0.0 


gi | 13517508 |gb|AAK2 
8832.1|AF313465 1 
(AF313465) 


sodium 
bicarbonate 
cotransporter 
[Homo sapiens] 


990 


938/986 
(95%) 


940/986 
(95%) 


0.0 


gi | 13 24 9295 |gb|AAKl 
6733 . 1 | AF336237 1 
(AF336237) 


anion exchanger 
AE4 [Homo 
sapiens] 


945 


922/962 
(95%) 


926/962 
(95%) 


0.0 
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Table 55D lists the domain descriptions from DOMAIN analysis results against 
NOV55. This indicates that the NOV55 sequence has properties similar to those of other 
proteins known to contain this domain. 

5 

Table 55D. Domain Analysis of NOV55 

gnl[Pfam l pfam00955 , HC03_cotransp, HC03- transporter family. This 
family contains Band 3 anion exchange proteins that exchange CL-/HC03-. 
This family also includes cotransporters of Na+/HC03-. 

CD-Length = 781 residues, 100.0% aligned 

Score = 731 bits (1887), Expect =0.0 



The rabbit anion exchanger 4B protein is a member of the bicarbonate ion transporter 
superfamily and is present on the apical membrance of beta-intercalated cells in the collecting 
ducts of the rabbit kidney (See Tsuganezawa et al. ? J Biol Chem 2000 Dec 1). The rabbit 
10 protein has sodium-independent anion exchanger activity when expressed in cultured COS-7 
cells and Xenopus oocytes. 

The acid-secreting alpha intercalated cells and bicarbonate-secreting beta intercalated 
cells are sites for modulation of urinary acid secretion, which in turn governs acid-base 
homeostasis. Mutations in the red cell anion exchanger gene, for instance, are correlated with 
15 familial distal renal tubular acidosis (See Bruce et al., Biochem Cell Biol 1998; 76(5): 723- 
728). 

The disclosed NOV55 nucleic acid of the invention encoding a anion exchanger-like 
protein includes the nucleic acid whose sequence is provided in Table 55A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 

20 be changed from the corresponding base shown in Table 55A while still encoding a protein 

that maintains its anion exchanger-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

25 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
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stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 20 percent of the bases may be so changed. 

The disclosed NOV55 protein of the invention includes the anion exchanger-like 
5 protein whose sequence is provided in Table 55B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 55B while still encoding a protein that maintains its anion exchanger-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 13 percent of the residues may be so changed. 

1 0 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this anion exchanger- 
like protein (NOV55) may function as a member of a "anion exchanger family". Therefore, 
the NOV55 nucleic acids and proteins identified here may be useful in potential therapeutic 

1 5 applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

20 and cell types composing (but not limited to) those defined here. 

The NOV55 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the anion exchanger-like 
protein (NOV55) may be useful in gene therapy, and the anion exchanger-like protein 

25 (NOV55) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from acidosis, alkalosis, diabetes, autoimmune disease, renal 
artery stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney disease, systemic 
lupus erythematosus, renal tubular acidosis, IgA nephropathy, hypercalceimia, Lesch-Nyhan 

30 syndrome, cancer, tissue degeneration, bacterial/viral/parasitic infection, or other pathologies 
or conditions. The NOV55 nucleic acid encoding the anion exchanger-like protein of the 
invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 
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NOV55 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind imrnuno-specifically to the novel NOV55 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV55 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV56 

A disclosed NOV56 nucleic acid of 1083 nucleotides (also referred to as CG57348-01) 
encoding a PR _SET_domain protein-like protein is shown in Table 5 6 A. The start and stop 
codons are in bold letters. 



Table 56A. NOV56 nucleotide sequence (SEQ ID NO:129). 



GGCGGCGAGGGTGCGGGGCGCGCTGCCATGGGCCTTGGCGGCGCCGCCGCGGCGCTGGTGGCGGCGGCAG 
CAGCAGCAGCAGCGGCAGCGGCAGCGGTGGTGGCCGGGCCGCGGCGGCGGCGGCGAGGGTGCGGGGCGCG 
CTGCCATGGGCCTGGCCGGGCTGCAGGCAAGAAGATGTCCAAGCCCCGCGCGCTGGAGGCGGCGGCGGCG 
GCGGCAGCGACGGCCCCGGGCCTGGAGATGGTGGAGCGGAGGGGCCCGGGGAGGCCCCGCACCGATGGGG 
AGAGCGTATTTACCGGGCAGTCAAAGATCTATTCCTACATGAGCCCGAACAAATGCTCTGGAATGCGTTT 
C C C C C TT CAAGAAGAGAACTCGGTTACACAT CACGAAG T CAAATG C CAGGGG AAAC CATTAG C CGG AATC 
TACAGGAAA CGAGAAGAGAAAAGAAATAC TGGG AACG CAG TACAGAG CG C CATGAAG TC CAAGAAACAG A 
AGAT CAAAG ACG C CAGGAG AGGT C CC CTG CAAGG AAAAACACAACAGAAT CACAAACTTACGGATTTCTA 
C C CTG T C CGAAGG AGAT CCAGG AAGAG CAAAG C CG AG CTG CAGTCTG AAGAAAGGAAAAG AATAGATGAA 
TTGAT TGAAAGTGGGAAGG AAGAAGG AATGAAGATTGAC CT CATCGATGG CAAAGG CAGGGG TG TGAT TG 
CCACCAAGCATTTCTCCCGGGGTGCCTTTGTGGTGGAATACCACGGGGACCTCATCGAGATCACCGACGC 
CAAGAAACGGGAGGCTCTGTATGCACAGGACCCTTCCACGGGCTGCTACATGTACTATTTTCAGTATCTG 
AG CAAAAC C TACTG CG TGGATG CAACTAGAGAGACAAATCG C CCAGGAAG AC CGATCAAT CACAG CAAAT 
GTGGGAACTGCCAAACCAAACTGCACGAC^TCGACGGCGTACCTC^ 

CAT CG CGG CTGGGGAGGAG CTC C TGTATGAC TATGGGGAC CG CAG CAAGG C T T C CAT TG AAG C C CA C C CG 
TGGCTGAAGCATTAACCGGTGGGCCCCGCGCCC 



In a search of public sequence databases, the NOV56 nucleic acid sequence, located on 
chromsome 12 has 1036 of 1086 bases (95%) identical to a gb:GENBANK- 
ID:AF287261|acc:AF28726Ll mRNA from Homo sapiens (Homo sapiens PR/SET domain 
containing protein 07 (SET07) mRNA, complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV56 polypeptide (SEQ ID NO: 1 30) encoded by SEQ ID NO: 1 29 has 
345 amino acid residues and is presented in Table 56B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV56 has a signal peptide and is 
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likely to be localized in the endoplasmic reticulum with a certainty of 0.600. The most likely 
cleavage site for a NOV56 peptide is between amino acids 22 and 23. 



Table 56B. Encoded NOV56 protein sequence (SEQ ID NO: 130). 



MGLGGAAAALVAAAAAAAAAAAAWAGPRRRRRGCGARCHGPGRAAGKKMSKPRALEAAA 
AAAATAPGLEMVERRGPGRPRTDGESVFTGQSKIYSYMSPNKCSGMRFPLQEENSVTHHE 
VKCQGKPIAGIYRKREEKRNTGNAVQSAMKSKKQKIKDARRGPLQGKTQQNHKLTDFYPV 
RRRSRKSKAELQSEERKRIDELIESGKEEGMKIDLIDGKGRGVIATKHFSRGAFWEYHG 
DL I E I TDAKKRE ALYAQDP STGC YMYYFQYLSKT YC VDATRETNRPGRP INHSKCGNCQT 
KLHDIDGVPHLILIASQDIAAGEELLYDYGDRSKASIEAHPWLKH 



A search of sequence databases reveals that the NOV56 amino acid sequence has 314 
5 of 345 amino acid residues (91%) identical to, and 322 of 345 amino acid residues (93%) 
similar to, the 345 amino acid residue ptnr:SPTREMBL-ACC:Q9NQRl protein from Homo 
sapiens (Human) (PR/SET DOMAIN CONTAINING PROTEIN 07). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV56 is expressed in at least Bone Marrow, Brain, Cervix, Dermis, Heart, Kidney, 

1 0 Liver, Lung, Lymph node, Pancreas, Parietal Lobe, Pituitary Gland, Placenta, Skin, Spinal 
Chord, Spleen, Thymus, Thyroid, Umbilical Vein,Adrenal Gland/Suprarenal gland, Aorta, 
Ascending Colon, Brain, Buccal mucosa, Cartilage, Cervix, Chorionic Villus, Colon, 
Coronary Artery, Duodenum, Heart, Kidney, Liver, Lung, Ovary, Parietal Lobe, Parotid 
Salivary glands, Peripheral Blood, Prostate, Retina, Salivary Glands, Small Intestine, 

1 5 Synovium/Synovial membrane, Testis, Tonsils, Umbilical Vein, and Urinary Bladder. 
Expression information was derived from the tissue sources of the sequences that were 
included in the derivation of the sequence of CG57348_01.The sequence is predicted to be 
expressed in the following tissues because of the expression pattern of (GENBANK-ID : 
gb:GENBANK-ID:AF28726I|acc:AF28726Ll) a closely related Homo sapiens PR/SET 

20 domain containing protein 07 (SET07) mRNA, complete cds homolog in species Homo 
sapiens :Bone Marrow, Brain, Cervix, Dermis, Heart, Kidney, Liver, Lung, Lymph node, 
Pancreas, Parietal Lobe, Pituitary Gland, Skin, and Colon. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 

25 sources. 

The disclosed NOV56 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 56C. 
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Table 56C. BLAST results for NOV56 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l5295688|ref |XP 


similar to PR/ SET 
domain containing 
protein 07 (H. 
sapiens) [Homo 
sapiens] 


403 


306/359 
(85%) 


306/359 
(85%) 


e-149 


004668 .4 | 
(XM_004668 


gi| 15042945|ref |NP 


PR/SET domain 
containing 
protein 07 [Homo 
sapiens] 


393 


319/393 
(81%) 


324/393 
(82%) 


e-139 


065115.2] 
(NM__020382) 


gi| 15303323 |ref |XP 


PR/SET domain 
containing 
protein 07 [Homo 
sapiens] 


240 


145/165 
(87%) 


150/165 
(90%) 


3e-46 


017300 .2) 
<XM_017300) 


gi | 7299871 | gb| AAF55 


CG33 07 gene 
product 
[Drosophila 
melanogaster] 


689 


100/230 
(43%) 


143/230 
(61%) 


2e-45 


047. 1| (AE003704) 


gi | 17554790 | ref | NP 


T26A5 .7.p 
[Caenorhabditis 
elegans] 


242 


83/203 
(40%) 


125/203 
(60%) 


2e-33 


498417. 1| 
(NM 066016) 



Table 56D lists the domain descriptions from DOMAIN analysis results against 
NOV56. This indicates that the NOV56 sequence has properties similar to those of other 
proteins known to contain this domain. 

5 



Table S6D. Domain Analysis of NOV56 

gnl|Smart|smart00317, SET, SET (Su(var)3-9, Enhancer-of-zeste, 
Trithorax) domain; Putative methyl transferase, based on outlier plant 
homologues 

CD-Length = 125 residues, 96.8% aligned 

Score = 107 bits (266), Expect = le-24 



Association of SET domain and myotubularin-related proteins modulates growth 
control. The PR domain of the Rb-binding zinc finger protein RIZ1 is a protein binding 
interface and is related to the SET domain functioning in chromatin-mediated gene expression. 

10 SET domains appear to be protein-protein interaction domains. It has been 

demonstrated that SET domains mediate interactions with a family of proteins that display 
similarity with dual-specificity phosphatases (dsPTPases) [2]. A subset of SET domains have 
beencalled PR domains. These domains are divergent in sequence from other SET domains, 
but also appear to mediate protein-protein interaction [3]. 

1 5 The SET domain is a highly conserved, approximately 1 50-amino acid motif 

implicated in the modulation of chromatin structure. It was originally identified as part of a 
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larger conserved region present in the Drosophila Trithorax protein and was subsequently 
identified in the Drosophila Su(var)3-9 and 'Enhancer of zeste' proteins, from which the 
acronym SET is derived. Studies have suggested that the SET domain may be a signature of 
proteins that modulate transcriptionally active or repressed chromatin states through chromatin 
5 remodeling activities. 

By sequencing cDNAs randomly selected from a cDNA library derived from a human 
immature myeloid cell line, Nomura et al. (1994) isolated a cDNA encoding SETDB1, which 
they called KIAA0067. The deduced SETDB1 protein has 1,291 amino acids. Northern blot 
analysis detected SETDB1 expression in all 16 human tissues examined. 

10 In the course of searching sequence databases for proteins containing SET domains, 

Harte et al. (1999) identified the SETDB1 sequence. They determined that SETDB1 has a C- 
terminal SET domain that is well-conserved except that it contains a 347-amino acid insertion 
between its most highly conserved regions. The authors found that the C. elegans YNCA gene 
product is highly similar to SETDB1 and also contains a bifurcated SET domain. 

15 Nomura et al. (1994) mapped the SETDB1 gene to chromosome 1 using a somatic cell 

hybrid mapping panel. By FISH and radiation hybrid mapping, Harte et al. (1999) mapped the 
SETDB1 gene to lq21. 

The disclosed NOV56 nucleic acid of the invention encoding a PRESET domain 
protein-like protein includes the nucleic acid whose sequence is provided in Table 56A or a 

20 fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 56A while still encoding a 
protein that maintains its PR SET domain protein-like activities and physiological functions, 
or a fragment of such a nucleic acid. The invention further includes nucleic acids whose 
sequences are complementary to those just described, including nucleic acid fragments that are 

25 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 

30 stability of the modified nucleic acid, such that they may be used, for example, as antisense 

binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 5 percent of the bases may be so changed. 

The disclosed NOV56 protein of the invention includes the PR SET domain protein- 
like protein whose sequence is provided in Table 56B. The invention also includes a mutant 



317 



.uCnri ~Li2l TLJP < u^T^^^^^Bl^ Si >4u 1 2* " ffn-ri iLu^< .tunJ? "2J^ 



or variant protein any of whose residues may be changed from the corresponding residue 
shown in Table 56B while still encoding a protein that maintains its PRSETdomain protein- 
like activities and physiological functions, or a functional fragment thereof. In the mutant or 
variant protein, up to about 9 percent of the residues may be so changed. 
5 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this PR SET domain 
protein-like protein (NOV56) may function as a member of a "PR_SET_domain protein 
family". Therefore, the NOV56 nucleic acids and proteins identified here may be useful in 

10 potential therapeutic applications implicated in (but not limited to) various pathologies and 

disorders as indicated below. The potential therapeutic applications for this invention include, 
but are not limited to: protein therapeutic, small molecule drug target, antibody target 
(therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic 
marker, gene therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo 

1 5 and in vitro of all tissues and cell types composing (but not limited to) those defined here. 

The NOV56 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the PR SET domain 
protein-like protein (NOV56) may be useful in gene therapy, and the PR SET domain 

20 protein-like protein (NOV56) may be useful when administered to a subject in need thereof. 
By way of nonlimiting example, the compositions of the present invention will have efficacy 
for treatment of patients suffering from CNS disorders, brain disorders including epilepsy, 
eating disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune 
disorders including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, 

25 inflammatory skin disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; 

thalamus disorders; metabolic disorders including diabetes and obesity; lung diseases such as 
asthma, emphysema, cystic fibrosis, pancreatic disorders including pancreatic insufficiency 
and cancer; and prostate disorders including prostate cancer, or other pathologies or 
conditions. The NOV56 nucleic acid encoding the PR_SET_domain protein-like protein of the 

30 invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV56 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV56 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
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known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV56 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV57 

A disclosed NOV57 nucleic acid of 5896 nucleotides (also referred to as CG57650- 
01) encoding a non-muscle myosin heavy chain B-like protein is shown in Table 57A. The 
start and stop codons are in bold letters. 



Table 57A. NOV57 nucleotide sequence (SEQ ID NO:131). 



GGCGGTGCTGCGGGACGAAGGCGAGGAGGAGGCGGAGGTGGAGCTGGCGGAGAGCGGGAG 
GCGGCTGCGACTGCCGCGGGACCAGATCCAGCGCATGAACCCGCGCCAAGTTCAGCA AGG 
CCGAGGAC ATGGCCGAGCTGACCTGCCTCAACGAGGCCTCGGTCCTGCACAACCTCCGGG 
AGCGGTACTACTCCGGCCTCATCTACACGTACTCCGGCCTTTTCTGTGTGGTCATCAACC 
CGTACAAGCAGCTTCCCATCTACACAGAAGCCATTGTGGAGATGTACCGGGGCAAGAAGC 
GCCACGAGGTGCCACCCCACGTGTACGCAGTGACCGAGGGGGCCTATCGGAGCATGCTGC 
AGGATCGTGAGGACCAGTCCATTCTCTGCACGGGAGAGTCTGGAGCTGGGAAGACGGAAA 
ACACCAAGAAGGTCATCCAGTACCTCGCCCACGTGGCGTCGTCTCCAAAGGGCAGGAAGG 
AGCCGGGTGTCCCCGGTGAGCTGGAGCGGCAGCTGCTTCAGGCCAACCCCATCCTAGAGG 
CCTTTGGCAATGCCAAGACAGTGAAGAATGACAACTCCTCCCGATTCGGCAAATTCATCC 
GCATCAACTTTGATGTTGCCGGGTACATCGTGGGCGCCAACATTGAGACCTGTCTGCTGG 
AGAAGTCGCGGGCCATCCGCCAGGCCAAGGACGAGTGCAGCTTCCACATCTTCTACCAGC 
TGCTGGGGGGCGCTGGAGAGCATGGCTGCCGAGAACTCCTCCTCGAGCCCTGCTCCCACT 
ACCGGTTCCTGACCAACGGGCCGTCATCCTCTCCCGGCCAGGAGCGGGAACTCTTCCAGG 
AGACGCTGGAGTCGCTGCGGGTCCTGGGATTCAGCCACGAGGAAATCATCTCCATGCTGC 
GGATGGTCTCAGCAGTTCTCCAGTTTGGCAACATTGCCTTGAAGAGAGAACGGAACACCG 
ATCAAGCCACCATGCCTGACAACACAGCTGCACAGAAGCTCTGCCGCCTCTTGGGACTGG 
GGGTGACGGATTTCTCCCGAGCCTTGCTCACCCCTCGCATCAAAGTTGGCCGAGACTATG 
TGCAGAAAGCCCAGACTAAGGAACAGGCTGACTTCGCGCTGGAGGCCCTGGCCAAGGCCA 
CCTACGAGCGCCTCTTCCGCTGGCTGGTTCTGCGCCTCAACCGGGCCTTGGACCGCAGCC 
CCCGCCAAGGCGCCTCCTTCCTGGGCATCCTGGACATCGCGGGCTTTGAGATCTTCCAGC 
TGAACTCCTTCGAGCAGCTCTGCATCAACTACACCAACGAGAAGCTGCAGCAGCTCTTCA 
ACCACACCATGTTCGTGCTGGAGCAGGAGGAGTACCAGCGTGAGGGCATCCCCTGGACCT 
TCCTCGACTTTGGCCTCGACCTGCAGCCCTGCATCGACCTCATCGAGCGGCCGGCCAACC 
CCCCTGGACTCCTGGCCCTGCTGGATGAGGAGTGCTGGTTCCCGAAGGCCACAGACAAGT 
CGTTTGTGGAGAAGGTAGCCCAGGAGCAGGGCGGCCACCCCAAGTTCCAGCGGCCGAGGC 
ACCTGCGGGATCAGGCCGACTTCAGTGTTCTCCACTACGCGGGCAAGGTCGACTACAAGG 
CCAACGAGTGGCTGATGAAAAACATGGACCCTCTGAATGACAACGTCGCAGCCTTGCTCC 
ACCAGAGCACAGACCGGCTGACGGCAGAGATCTGGAAAGACGTGGAGGGCATCGTGGGGC 
TGGAACAGGTGAGCAGCCTGGGCGACGGCCCACCAGGTGGCCGCCCCCGTCGGGGTATGT 
TCCGGACAGTGGGACAGCTCTACAAGGAGTCCCTGAGCCGCCTCATGGCCACACTCAGCA 
ACACCAACCCCAGTTTTGTCCGCTGCATTGTCCCCAACCACGAGAAGAGGGTCGGGAAGC 
TGGAGCCGCGGCTGGTGCTGGACCAGCTTCGCTGCAACGGGGTCCTGGAGGGCATCCGCA 
TCTGTCGCCAGGGCTTCCCCAACCGCATCCTCTTCCAGGAGTTCCGGCAGCGATACGAGA 
TCCTGACACCCAATGCCATCCCCAAGGGCTTCATGGATGGGAAGCAGGCCTGTGAAAAGA 
TGATCCAGGCGCTGGAACTGGACCCCAACCTCTACCGCGTGGGACAGAGCAAGATCTTCT 
TCCGGGCTGGGGTCCTGGCCCAGCTGGAAGAGGAGCGAGACCTGAAGGTCACCGACATCA 
TCGTCTCCTTCCAGGCAGCTGCCCGGGGATACCTGGCTCGCAGGGCCTTCCAGAAGCGCC 
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AGCAGCAGCAGAGCGCCCTGAGGGTGATGCAGCGGAACTGCGCGGCCTACCTCAAGCTGA 
GACACTGGCAGTGGTGGCGGCTGTTTACCAAGGTGAAGCCACTGCTGCAGGTGACGCGGC 
AGGATGAGGTGCTGCAGGCACGGGCCCAGGAGCTGCAGAAAGTGCAGGAGCTACAGCAGC 
AGAGCGCCCGCGAAGTTGGGGAGCTCCAGGGCCGAGTGACACAGCTGGAAGAGGAGCGCG 
CCCGCCTGGCAGAGCAATTGCGAGCAGAGGCAGAACTGTGTGCAGAGGCCGAGGAGACGC 
GGGGGAGGCTGGCAGCCCGCAAGCAGGAGCTGGAGCTGGTGGTGTCAGAGCTGGAGGCTC 
GCGTGGGCGAGGAGGAGGAGTGCAGCCGTCAAATGCAAACCGAGAAGAAGAGGCTACAGC 
AGCACATACAGGAGCTAGAGGCCCACCTTGAGGCTGAGGAGGGTGCGCGGCAGAAGCTGC 
AGCTGGAGAAGGTGACGACAGAGGCAAAAATGAAGAAATTTGAAGAGGACCTGCTGCTCC 
TGGAAGACCAGAATTCCAAGCTGAGCAAGGAACTGCTGGAAGATCGTCTGGCCGAGTTCT 
CATCCCAGGCAGCTGAGGAGGAGGAGAAGGTCAAGAGCCTCAATAAGCTACGGCTCAAAT 
ATGAGGCCACAATCGCAGACATGGAGGACCGCCTACGGAAGGAGGAGAAGGGTCGCCAGG 
AGCTGGAGAAGCTGAAGCGGAGGCTGGATGGGGAGAGCTCAGAGCTGCAGGAGCAGATGG 
TGGAGCAGCAACAGCGGGCAGAGGAGCTGCGGGCCCAGCTGGGCCGGAAGGAGGAGGAGC 
TGCAGGCTGCCCTGGCCAGGGCAGAAGACGAGGGTGGGGCCCGGGCCCAGCTGCTGAAAT 
CCCTGCGGGAGGCTCAAGCAGCCCTGGCCGAGGCCGAGGCCCAGGAGGACCTGGAGTCTG 
AGCGTGTGGCCAGGACCAAGGCGGAGAAGCAGCGCCGGGACCTGGGCGAGGAGCTGGAGG 
CGCTGCGGGGCGAGCTGGAGGACACGCTGGACTCCACCAACGCACAGCAGGAGCTCAGGT 
CCAAGAGGGAACAGGAGGTGACGGAGCTGAAGAAGACTCTGGAGGAGGAGACTCGCATCC 
ACGAGGCGGCAGTGCAGGAGCTGAGGCAGCGCCACGGCCAGGCCCTGGGGGAGCTGGCGG 
AGCAGCTGGAGCAGGCCCGGAGGAAAGGTGCATGGGAGAAGACCCGGCTGGCCCTGGAGG 
CCGAGGTGTCCGAGCTGCGGGCAGAACTGAGCAGCCTGCAGACTGCACGTCAGGAGGGTG 
AGCAGCGGAGGCGCCGCCTGGAGTTACAGCTGCAGGAGGTGCAGGGCCGGGCTGGTGATG 
GGGAGAGGGCACGAGCGGAGGCTGCTGAGAAGGTCCCTTCCCTGCAGGCTGAACTGGAGA 
ATGTGTCTGGGGCGCTGAACGAGGCTGAGTCCAAAACCATCCGTCTTAGCAAGGAGCTGA 
GCAGCACAGAAGCCCAGCTGCACGATGCCCAGGAGCTGCTGCAGGAGGAGACCAGGGCGA 
AATTGGCCTTGGGGTCCCGGGTGCGAGCCATGGAGGCTGAGGCAGCCGGGCTGCGTGAGC 
AGCTGGAGGAGGAGGCAGCTGCCAGGGAACGGGCGGACCACCAACCACCCTCTCTCTCCT 
CCCCTCAGCTTTCCGAGTGGCGGCGGCGCCAGGAGGAGGAGGCAGGGGCACTGGAGGCAG 
GGGAGGAGGCACGGCGCCGGGCAGCCCGGGAGGCCGAGGCCCTGACCCAGCGCCTGGCAG 
AAAAGACAGAGACCGTGGATCGGCTGGAGCGGGGCCGCCGCCGGCTGGGGCAGGAGCTGG 
ACGACGCCACCATGGACCTGGAGCAGCAGCGGCAGCTTGTGAGCACCCTGGAGAAGAAGC 
AGCGCAAGTTTGACCAGCTTCTGGCAGAGGAGAAGGCAGCTGTACTTCGGGCAGTGGAGG 
AACGTGAGCGGGCCGAGGCAGAGGGCCGGGAGCGTGAGGCTCGGGCCCTGTCACTGACAC 
GGGCACTGGAGGAGGAGCAGGAGGCACGTGAGGAGCTGGAGCGGCAGAACCGGGCCCTGC 
GGGCTGAGCTGGAGGCACTGCTGAGCAGCAAGGATGACGTCGGCAAGAGCGTGCATGAGC 
TGGAACGAGCCTGCCGGGTAGCAGAACAGGCAGCCAATGATCTGCGAGCACAGGTGACAG 
AACTGGAGGATGAGCTGACAGCGGCCGAGGATGCCAAGCTGCGTCTGGAGGTGACTGTGC 
AGGCTCTCAAGACTCAGCATGAGCGTGACCTGCAGGGCCGTGATGAGGCTGGTGAAGAGA 
GGCGGAGGCAGCTGGCCAAGCAGCTGAGAGATGCAGAGGTGGAGCGGGATGAGGAGCGGA 
AGCAGCGCACTCTGGCCGTGGCTGCCCGCAAGAAGCTGGAGGGAGAGCTGGAGGAGCTGA 
AGGCTCAGATGGCCTCTGCCGGCCAGGGCAAGGAGGAGGCGGTGAAGCAGCTTCGCAAGA 
TGCAGGCCCAGATGAAGGAGCTATGGCGGGAGGTGGAGGAGACACGCACCTCCCGGGAGG 
AGATCTTCTCCCAGAATCGGGAAAGTGAAAAGCGCCTCAAGGGCCTGGAGGCTGAGGTGC 
TGCGGCTGCAGGAGGAACTGGCCGCCTCGGACCGTGCTCGGCGGCAGGCCCAGCAGGACC 
GGGATGAGATGGCAGATGAGGTGGCCAATGGTAACCTTAGCAAGGCAGCCATTCTGGAGG 
AGAAGCGTCAGCTGGAGGGGCGCCTGGGGCAGTTGGAGGAAGAGCTGGAGGAGGAGCAGA 
CAACTCAGAGCTGCTCAATGACCGCTACCGCAAGCTGCTCCTGCAGGTAGAGTCACTGAC 
CACAGAGCTGTCAGCTGAGCGCAGTTTCTCAGCCAAGGCAGAGAGCGGGC GGCAGCAGCT 
GGAACGGCAGATCCAGGAGCTACGGGGACGCCTGGGTGAGGAGGATGCTGGGG CCCGTGC 
CCGCCACAAGATGACCATTGCTGCCCTTGAGTCTAAGTTGGCCCAGGCTGAGG AGCAGCT 
AGAGCAAGAGACCAGAGAGCGCATCCTCTCTGGAAAGCTGGTGCCCAAAAGTTAAGAAGC 
GGCTTAAAGAGGTGGTGCTCCAGGTGGAGGAGGAGCGGAGGGTGGCTGACCAGCTCCGGG 
ACCAGCTGGAGAAGGGAAACCTTCGAGTCAAGCAGCTGAAGCGGCAGCTGGAGGAGGCCG 
AGGAGGAGGCATCCCGGGCTCAGGCCGGCCGCCGGAGGCTGCAGCGTGAGCTGGAAGATG 
TCACAGAGTCGGCCGAGTCCATGAACCGTGAAGTGACCACACTGAGGAACCGGCTTCGAC 
GCGGCCCCCTCACCTTCACCANCCGCACGGTGCGCCAGGTCTTCCGACTAGAGGAGGGCG 
TGGCATCCGACGAGGAGGCAGAGGAAGCACAGCCTGGGTCTGGGCCATCCCCTCTCACTC 
CTGCTGCTGCCCATGCTCTGCCCTCCCTTCTGGTTGCTCTGAGGGTTCGGAGCTTCCCTC 
TGGGACTAAAGGAGTGTCCTTTACCCTCCCAGCCTCCCGGCTCTGGCAGAAATAAACTCC 
AACCCGAATGGAAAAA 
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In a search of public sequence databases, the NOV57 nucleic acid sequence, located on 
chromsome 10 has 3509 of 4934 bases (71%) identical to a gbrGENBANK- 
ID:RABMHCP|acc:M77812.1 mRNA from Oryctolagus cuniculus (Rabbit myosin heavy 
chain mRNA, complete cds). Public nucleotide databases include all GenBank databases and 
5 the GeneSeq patent database. 

The disclosed NOV57 polypeptide (SEQ ID NO: 132) encoded by SEQ ID NO:131 
has 1673 amino acid residues and is presented in Table 57B using the one- letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV57 has no signal peptide and 
is likely to be localized in the nucleus with a certainty of 0.9600. 

10 

Table 57B. Encoded NOV57 protein sequence (SEQ ID NO: 132). 

MAELTCLNEASVLHNLRERYYSGLIYTYSGLFCWINPYKQLPIYTEAIVEMYRGKKRHE 
VPPHVYAVTEGAYRSMLQDREDQS ILCTGESGAGKTENTKKVIQYLAHVAS S PKGRKEPG 
VPGELERQLLQANP I LEAFGNAKTVKNDNS SRFGKFIR INFDVAG YI VGANI E TCLLEKS 
RAIRQAKDECSFHIFYQLLGGAGEHGCRELLLEPCSHYRFLTNGPSSSPGQERELFQETL 
ESLRVLGFSHEE I I SMLRMVS AVLQFGNI ALKRERNTDQATMPDNTAAQKLCRLLGLGVT 
DFSRALLTPRIKVGRDYVQKAQTKEQADFALEALAKATYERLFRWIjVLRLNRALDRSPRQ 
GAS FLG I LD IAGFE I FQLNS FEQLC INYTNEKLQQLFNHTMFVLEQEE YQREGI PWTFLD 
FGLDLQPCIDLIERPANPPGLLALLDEECWFPKATDKSFVEKVAQEQGGHPKFQRPRHLR 
DQADFS VIiHYAGKVDYKANEWLMKNMDPLNDNVAALLHQSTDRLTAE I WKDVEG I VGLEQ 
VSSLGDGPPGGRPRRGMFRTVGQLYKESLSRLMATLSNTNPSFVRCIVPNHEKRVGKLEP 
RLVLDQLRCNGVLEGIRICRQGFPNRILFQEFRQRYEILTPNAIPKGFMDGKQACEKMIQ 
ALELDPNLYRVGQSKI FFRAGVLAQLEEERDLKVTDI IVSFQAAARGYLARRAFQKRQQQ 
QSALRVMQRNCAAYLKLRHWQWWRLFTKVKPLLQVTRQDEVLQARAQELQKVQELQQQSA 
REVGELQGRVTQLEEERARLAEQLRAEAELCAEAEETRGRLAARKQELELVVSELEARVG 
EEEECSRQMQTEKKRLQQHIQELEAHLEAEEGARQKLQLEKVTTEAKMKKFEEDLLLLED 
QNSKLSKELLEDRLAEFSSQAAEEEEKVKSLNKLRLKYEATIADMEDRLRKEEKGRQELE 
KLKRRLDGESSELQEQIWEQQQRAEELRAQLGRKEEELQAAIjARAEDEGGARAQLLKSLR 
EAQAALAEAEAQEDLESERVARTKAEKQRRDLGEELEALRGELEDTLDSTNAQQELRSKR 
EQEVTELKKTLEEETRIHEAAVQELRQRHGQALGELAEQLEQARRKGAWEKTRLALEAEV 
S ELRAELS SLQTARQEGEQRRRRLELQLQEVQGRAGDGERARAE AAEKVP S LQAELENVS 
GALNE AES KT I RLSKELS S TEAQLHDAQELLQEETRAKLALGS RVRAME AEAAGLREQLE 
EEAAARERADHQPPSLSSPQLSEWRRRQEEEAGAL.EAGEEARRRAAREAEALTQRLAEKT 
ETVDRLERGRRRLGQELDDATMDLEQQRQLVSTLEKKQRKFDQLLAEEKAAVLRAVEERE 
RAEAEGREREARALSLTRALEEEQEAREELERQNRALRAELEALLSSKDDVGKSVHELER 
ACRVAEQAANDLRAQVTELEDELTAAEDAKLRLEVTVQALKTQHERDLQGRDEAGEERRR 
QLAKQLRDAEVERDEERKQRTLAVAARKKLEGELEELKAQMASAGQGKEEAVKQLRKMQA 
QMKELWREVEETRTSREEIFSQNRESEKRLKGLEAEVLRLQEELAASDRARRQAQQDRDE 
MADEVANGNLSKAAILEEKRQLEGRLGQLEEELEEEQTTQSCSMTATASCSCR 



A search of sequence databases reveals that the NOV57 amino acid sequence has 1 149 
of 1661 amino acid residues (69%) identical to, and 1395 of 1661 amino acid residues (83%) 
similar to, the 1976 amino acid residue ptnr:SWISSNEW-ACC:P35580 protein from Homo 
1 5 sapiens (Human) (MYOSIN HEAVY CHAIN, NONMUSCLE TYPE B (CELLULAR 

MYOSIN HEAVY CHAIN, TYPE B) (NMMHC-B))(. Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 
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NOV57 is expressed in at least adrenal gland, bone marrow, brain - amygdala, brain - 
cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 
pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 
5 spinal cord, spleen, stomach, testis, thyroid, trachea and uterus. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV57 polypeptide has homology to the amino acid sequences shown 
10 in the BLASTP data listed in Table 57C . 



Table 57C. BLAST results for NOV57 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l3928704[ref | NP 


myosin heavy- 
chain 11 [Rattus 
norvegicus] 


1976 


1144/1664 
(68%) 


1389/1664 
(82%) 


0.0 


113708. l| 
(NM_031520) 


qi| 21244 9 |gb|AAA4 8 9 


nonmus c 1 e myo sin 
heavy chain 
[Gallus gallus] 


1976 


1148/1664 
(68%) 


1395/1664 
(82%) 


0.0 


85. 1| (M93676) 


gi|l346640|sp|P3558 


Myosin heavy 
chain, nonmuscle 
type B (Cellular 
myosin heavy 
chain, type B) 
(Nonmuscle myosin 
heavy chain-B) 
(NMMHC-B) 


1976 


1149/1664 
(69%) 


1395/1664 
(83%) 


0.0 


0 | MYHA HUMAN 




gi | 212451 |gb| AAA4 89 


nonmuscle myosin 
heavy chain 
[Gallus gallus 


1997 


1148/1685 
(68%) 


1395/1685 
(82%) 


0 .0 


87. 1| (M93676) 


gi | 212450 |gb|AAA4 89 


nonmuscle myosin 
heavy chain 
[Gallus gallus] 


1986 


1148/1674 
(68%) 


1395/1674 
(82%) 


0.0 


86. ll (M93676) 



Tables 57D-H list the domain descriptions from DOMAIN analysis results against 
NOV57. This indicates that the NOV57 sequence has properties similar to those of other 
1 5 proteins known to contain this domain. 



Table 57D. Domain Analysis of NOV57 

qnl | Pf am | p£am00063 , myosin_head, Myosin head (motor domain) . 

CD-Length = 670 residues, 99.6% aligned 

Score = 922 bits (2384), Expect = 0.0 
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Table 57E. Domain Analysis of NOV57 

gnl I Smart | smartOQ242 , MYSc, Myosin. Large ATPases . ; ATPase; molecular 
motor. Muscle contraction consists of a cyclical interaction between 
myosin and actin. The core of the myosin structure is similar in fold 
to that of kinesin. 

CD-Length = 688 residues, 98.4% aligned 

Score = 866 bits (2238), Expect = 0.0 



Table 57F. Domain Analysis of NOV57 

gnl I Pf amj pf am01576 / Myosin_tail , Myosin tail. The myosin molecule is a 
mult i- subunit complex made up of two heavy chains and four light 
chains it is a fundamental contractile protein found in all eukaryote 
cell types. This family consists of the coiled-coil myosin heavy chain 
tail region. The coiled-coil is composed of the tail from two 
molecules of myosin. These can then assemble into the macromolecular 
thick filament. The coiled-coil region provides the structural 
backbone the thick filament. 

CD-Length = 860 residues, 78.4% aligned 

Score = 344 bits (882) , Expect = 3e-95 



Table 57G. Domain Analysis of NOV57 

gnl | Pf am[pf am01496 , V_ATPase_sub__a, V-type ATPase H6kDa subunit 
family. This family consists of the 116kDa V-type ATPase (vacuolar 
(H+) -ATPases) subunits, as well as V-type ATP synthase subunit i. The 
V-type ATPases family are proton pumps that acidify intracellular 
compartments in eukaryotic cells for example yeast central vacuoles, 
clathr in- coated and synaptic vesicles. They have important roles in 
membrane trafficking processes. The 116kDa subunit (subunit a) in the 
V-type ATPase is part of the V0 functional domain responsible for 
proton transport . The a subunit is a transmembrane glycoprotein with 
multiple putative transmembrane helices it has a hydrophilic amino 
terminal and a hydrophobic carboxy terminal . It has roles in proton 
transport and assembly of the V-type ATPase complex. This subunit is 
encoded by two homologous gene in yeast VPH1 and STV1 . 

CD-Length = 703 residues, 34.3% aligned 

Score = 57.0 bits (136), Expect = 8e-09 



Table 57H. Domain Analysis of NOV57 

gnl|PfamlpfamOQ769 , ERM, Ezrin/radixin/moesin family. This family 
of proteins contain a band 4.1 domain (pfam00373), at their amino terminus. 
This family represents the rest of these proteins. 

CD-Length = 365 residues, 43.6% aligned 

Score = 52.0 bits (123), Expect ^ 3e-07 

Myosins constitute a large superfamily of actin-dependent molecular motors. 
Phylogenetic analysis currently places myosins into 1 5 classes. The conventional myosins 
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which form filaments in muscle and non-muscle cells form class II. There has been extensive 
characterization of these myosins and much is known about their function. With the exception 
of class I and class V myosins, little is known about the structure, enzymatic properties, 
intracellular localization and physiology of most unconventional myosin classes. (See Sellers 
5 JR, 2000, Biochim Biophys Acta 1496:3-22). The discovery in of a huge diversity within the 
myosin superfamily has been coupled with an understanding of the role of these motor 
proteins in various cellular functions. Extensive studies have revealed that myosin isoforms 
are not only involved in muscle contraction but also in crucial functions of many specialized 
mammalian cells such as melanocytes, kidney and intestinal brush border microvilli, nerve 

10 growth cones or inner ear hair cells. A search for genes involved in the pathology of human 
genetic deafness resulted in identification of three novel myosins: myosin VI, myosin VII A 
and, very recently, myosin XV. Recently, mutations have been detected within these genes 
that have been found to affect the hearing process (See Redowicz MJ, 1 999, J Muscle Res Cell 
Motil 20:241-8). Class II non-muscle myosins are implicated in diverse biological processes 

1 5 such as cytokinesis, cellularization, cell shape changes and gastrulation. Two distinct non- 
muscle myosin heavy chain genes have been reported in all vertebrates: non-muscle myosin 
heavy chain-A (NMHC-A) and -B (NMHC-B). Whole mount in situ hybridization with tailbud 
stage embryos of Xenopus showed that NMHC-A mRNA is predominantly expressed in the 
epidermis, whereas NMHC-B mRNA is expressed in the somites, brain, eyes and branchial 

20 arches. Interestingly, the expression of NMHC-B in developing somites was gradually 
restricted to the center of each somite as differentiation proceeds. DAPI nuclear staining 
demonstrated that NMHC-B mRNA is colocalized with the nuclei or perinuclear area. In 
animal cap experiments, treatment with activin A or ectopic expression of Xbra and an 
activated form of Xliml markedly up-regulates NMHC-B as well as muscle actin mRNAs and 

25 slightly down-regulates NMHC-A mRNA, consistent with NMHC-B expression in the somitic 
muscle and NMHC-A expression in the epidermis. (See Bhatia et ah, 1998, Mech Dev 78:33- 
6). 

The disclosed NOV57 nucleic acid of the invention encoding a non-muscle myosin 
heavy chain B-like protein includes the nucleic acid whose sequence is provided in Table 57A 
30 or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 57A while still 
encoding a protein that maintains its non-muscle myosin heavy chain B-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
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acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 29 percent of the bases may be so 
changed. 

The disclosed NOV57 protein of the invention includes the non-muscle myosin heavy 
chain B-like protein whose sequence is provided in Table 57B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 57B while still encoding a protein that maintains its non-muscle 
myosin heavy chain B-like activities and physiological functions, or a functional fragment 
thereof. In the mutant or variant protein, up to about 3 1 percent of the residues may be so 
changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this non-muscle myosin 
heavy chain B-like protein (NOV57) may function as a member of a "non-muscle myosin 
heavy chain B family". Therefore, the NOV57 nucleic acids and proteins identified here may 
be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV57 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the non-muscle myosin 
heavy chain B-like protein (NOV57) may be useful in gene therapy, and the non-muscle 
myosin heavy chain B-like protein (NOV57) may be useful when administered to a subject in 
need thereof. By way of nonlimiting example, the compositions of the present invention will 
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have efficacy for treatment of patients suffering from hypertension and vasospasm of the 
coronary and cerebral arteries, coronary artery spasm, artherosclerosis, hypertrophic 
cardiomyopathy, inflammatory diseases such as asthma, cancer, or other pathologies or 
conditions. The NOV57 nucleic acid encoding the non-muscle myosin heavy chain B-like 
5 protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV57 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV57 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

10 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV57 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

15 disorders. 

NOV58 

A disclosed NOV58 nucleic acid of 688 nucleotides (also referred to as CG57766-01) 
encoding a plasma retinol binding protein-like protein is shown in Table 5 8 A. The start and 
stop codons are in bold letters. 

20 



Table 58A. NOV58 nucleotide sequence (SEQ ID NO:133). 



CCCAACCACGGCCAGGCTTGCGCGCGGTTCCCCTCCCGGTGGGCGGATTCCTGGGCAAG ATGAAGTGGGT 
GTGGGCGCTCTTGCTGTTGGCGGCGCTGGGCAGCGGCCGCGCGGAGCGCGACTGCCGAGTGAGCAGCTTC 
CGAGTCAAGGAGAACTTCGACAAGGCTCGCTTCTCTGGGACCTGGTACGCCATGGCCAAGAAGGACCCCG 
AGGGCCTCTTTCTGCAGGACAACATCGTCGCGGAGTTCTCCGTGGACGAGACCGGCCAGATGAGCGCCAC 
AGCCAAGGGCCGAGTCCGTCTTTTGAATAACTGGGACGTGTGCGCAGACATGGTGGGCACCTTCACAGAC 
ACCGAGGACCCTGCCAAGTTCAAGATGAAGTACTGGGGCGTAGCCTCCTTTCTCCAGAAAGGAAATGATG 
ACCACTGGATCGTCGACACAGACTACGACACGTATGCCGTGCAGTACTCCTGCCGCCTCCTGAACCTCGA 
TGGCACCTGTGCTGACAGCTACTCCTTCGTGTTTTCCCGGGACCCCAACGGCCTGCCCCCAGAAGCGCAG 
AAGATTGTAAGGCAGCGGCAGGAGGAGCTGTGCCTGGCCAGGCAGTACAGGCTGATCGTCCACAACGGTT 
ACTG CGATGGCAGATCAGAAAGAAAC C TTT TG T AGC AAGGG CG AAT T C CAG CACACTG 



In a search of public sequence databases, the NOV58 nucleic acid sequence, located on 
chromsome 10q23-24 has 657 of 673 bases (97%) identical to a gb:GENBANK- 
ID:HSRBPl|acc:X00129.1 mRNA from Homo sapiens (Human mRNA for retinol binding 
25 protein (RBP)). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 
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The disclosed NOV58 polypeptide (SEQ ID NO: 134) encoded by SEQ ID NO: 133 has 
201 amino acid residues and is presented in Table 58B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV58 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.3700. The most likely cleavage site 
5 for a NOV58 peptide is between amino acids 1 8 and 19. 



Table 58B. Encoded NOV58 protein sequence (SEQ ID NO:134). 



MKWVWALLLLAALGSGRAERDCRVSSF^ 

AEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVAS FLQKGND 
DHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQEELCLA 
RQYRLIVHNGYCDGRSERNLL 



A search of sequence databases reveals that the NOV58 amino acid sequence has 196 
of 201 amino acid residues (97%) identical to, and 197 of 201 amino acid residues (98%) 
similar to, the 199 amino acid residue ptnr:SWISSNEW-ACC:P02753 protein from Homo 

10 sapiens (Human) (PLASMA RETINOL-BINDING PROTEIN PRECURSOR (PRBP) (RBP)). 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV58 is expressed in at least adrenal gland, bone marrow, brain - amygdala, brain - 
cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 

1 5 pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 
spinal cord, spleen, stomach, testis, thyroid, trachea and uterus. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

20 The disclosed NOV58 polypeptide has homology to the amino acid sequences shown 

in the BLASTP data listed in Table 58C. 



Table 58C. BLAST results for NOV58 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
{%) 


Po 
sitives 
(%) 


Expect 


gi | 18 0 8 83 26 |gb| AAH2 
0633 .1 |AAH20633 
(BC020633 


Similar to 
retinol binding 
protein 4, plasma 
[Homo sapiens] 


201 


200/201 
(99%) 


200/201 
(99%) 


e-113 


gi| 2136468 |pir| | 146 
257 


retinol binding 
protein precursor 
- horse 


201 


186/201 
(92%) 


195/201 
(96%) 


e-107 


gi|3041715|sp|P2748 
5 | RETB PIG 


PLASMA RETINOL- 
BINDING PROTEIN 
PRECURSOR (PRBP) 
(RBP) 


201 


184/201 
(91%) 


193/201 
(95%) 


e-106 
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qi|l7l0096|sp|P0691 
2 | RETB RABIT 


PLASMA RETINOL - 
BINDING PROTEIN 
PRECURSOR (PRBP) 
(RBP) 


201 


183 /201 
(91%) 


194/201 
(96%) 


e - 106 


gi| 8927l|pir | | A3948 
6 


plasma retinol- 
binding protein 
precursor - pig 


201 


184/201 
(91%) 


192/201 
(94%) 


e-106 



Table 58D lists the domain descriptions from DOMAIN analysis results against 
NOV58. This indicates that the NOV58 sequence has properties similar to those of other 
5 proteins known to contain this domain. 
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Table 58D. Domain Analysis of NOV58 

qnl 1 Pf am | pf amQ 00 61 , lipocalin, Lipocalin / cytosolic fatty-acid 
binding protein family. Lipocalins are transporters for small 
hydrophobic molecules, such as lipids, steroid hormones, bilins, and 
retinoids. Alignment subsumes both the lipocalin and fatty acid 
binding protein signatures from PROSITE. This is supported on 
structural and functional grounds. Structure is an eight -stranded beta 
barrel . 

CD-Length = 145 residues, 100.0% aligned 

Score = 102 bits (255) , Expect = 2e-23 



Vitamin A is mobilized from liver stores and transported in plasma in the form of the 
lipid alcohol retinol, bound to a specific transport protein, retinol-binding protein (RBP). A 

10 great deal is known about the chemical structure, metabolism, and biological roles of RBP. 
RBP is a single polypeptide chain with molecular weight close to 20,000. RBP interacts 
strongly with plasma prealbumin, and normally circulates in plasma as a 1:1 molar RBP- 
prealbumin complex. Both the primary and the tertiary structure of prealbumin are known, and 
the primary structure of RBP has recently been reported. Much information is available about 

1 5 the protein-protein and protein-ligand interactions that are involved in this transport system. 
Many clinical studies have examined the effects of a variety of diseases on the plasma levels 
of RBP and prealbumin in humans. Plasma RBP levels are low in patients with liver disease 
and are high in patients with chronic renal disease. These findings reflect the facts that RBP is 
produced in the liver and mainly catabolized in the kidneys. Delivery of retinol to extra- 

20 hepatic tissues appears to involve specific cell surface receptors for RBP. Vitamin A 

mobilization from the liver, and delivery to peripheral tissues, is highly regulated by factors 
that control the rates of RBP production and secretion. Retinol deficiency specifically blocks 
the secretion of RBP, so that plasma RBP levels fall and liver RBP levels rise. Injection of 
retinol into vitamin A-deficient rats stimulates the rapid secretion of RBP from the liver into 

25 the plasma (See Goodman D.S., 1980, Ann N Y Acad Sci 348:378-90). 
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The disclosed NOV58 nucleic acid of the invention encoding a plasma retinol binding 
protein-like protein includes the nucleic acid whose sequence is provided in Table 58 A or a 
fragment thereof The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 58 A while still encoding a 
protein that maintains its plasma retinol binding protein-like activities and physiological 
functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 3 percent of the bases may be so changed. 

The disclosed NOV58 protein of the invention includes the plasma retinol binding 
protein-like protein whose sequence is provided in Table 58B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 58B while still encoding a protein that maintains its plasma retinol 
binding protein-like activities and physiological functions, or a functional fragment thereof. In 
the mutant or variant protein, up to about 3 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a t>)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this plasma retinol 
binding protein-like protein (NOV58) may function as a member of a "plasma retinol binding 
protein family". Therefore, the NOV58 nucleic acids and proteins identified here may be 
useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 
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The NOV58 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the plasma retinol binding 
protein-like protein (NOV58) may be useful in gene therapy, and the plasma retinol binding 
5 protein-like protein (NOV58) may be useful when administered to a subject in need thereof. 
By way of nonlimiting example, the compositions of the present invention will have efficacy 
for treatment of patients suffering from cervical dysplasias and cancer, breast cancer, 
phenylketonuria, liver diseases, kidney diseases, alzheimers, infection and inflammations, or 
other pathologies or conditions. The NOV58 nucleic acid encoding the plasma retinol binding 
10 protein-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV58 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV58 substances for use in 

1 5 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV58 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

20 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV59 

A disclosed NOV59 nucleic acid of 1647 nucleotides (also referred to as CG57566-01) 
encoding a HIV-1 inducer of short transcripts binding protein like protein-like protein is 
25 shown in Table 59A. The start and stop codons are in bold letters. 



Table 59A. NOV59 nucleotide sequence (SEQ ID NO:135). 



ATGGCCGGCGGCGTGGACGGCCCCATCGGGATCCCGTTCCCCGACCACAGCAGCGACATC 
CTGAGTGGGCTGAACGAGCAGCGGACGCAGGGCCTGCTGTGCGACGTGGTGATCCTGGTG 
GAGGGCCGCGAGTTCCCCACGCACCGCTCGGTGCTGGCCGCCTGCAGCCAGTACTTCAAG 
AAGCTGTTCACGTCGGGCGCCGTGGTGGACCAGCAGAACGTGTACGAGATCGACTTCGTC 
AGCGCCGAGGCGCTCACCGCGCTCATGGACTTCGCCTACACGGCCACGCTCACCGTCAGC 
ACAGCCAACGTGGGTGACATCCTCAGCGCCGCCCGCCTGCTGGAGATCCCCGCCGTGAGC 
CACGTGTGCGCCGACCTCCTGGACCGGCAGATCCTGGCGGCCGACGCGGGCGCCGACGCC 
GGGCAGCTGGACCTTGTAGATCAAATTGATCAGCGCAACCTCCTCCGCGCCAAGGAGTAC 
CTCGAGTTCTTCCAGAGCAACCCCATGAACAGCCTGCCCCCCGCGGCCGCCGCCGCCGCT 
GCCAGCTTCCCGTGGTCCGCCTTTGGGGCGTCCGATGATGACCTGGATGCCACCAAGGAG 
GCCGTGGCCGCCGCTGTGGCCGCCGTGGCCGCGGGCGACTGCAACGGCTTAGACTTCTAT 

330 



»n -jfn -n c» ics^a -ct -42* *.r& :trn set. r^a 

"31^ 3WJ ywifl ^J^^^^nrr* ?LjJJ ill ?iaJJ 5L*«« 1L W .fl,.,, . 



GGGCCGGGCCCCCCGGCCGAGCGGCCCCCGACGGGGGACGGGGACGAGGGCGACAGCAAC 
CCGGGTCTGTGGCCAGAGCGGGATGAGGACGCCCCCACCGGGGGTCTCTTTCCGCCGCCG 
GTGGCCCCGCCGGCCGCCACGCAGAACGGCCACTACGGCCGCGGCGGAGAGGAGGAGGCC 
GCCTCGCTGTCGGAGGCGGCCCCCGAGCCGGGCGACTCTCCGGGCTTCCTGTCGGGAGAC 
AGCGACGAGGAGTCGCGGGCCGACGACAAGGGCGTCATGGACTACTACCTGAAGTACTTC 
AGCGGCGCCCACGACGGCGACGTCTACCCGGCCTGGTCGCAGAAGGTGGAGAAGAAGATC 
CGAGCCAAGGCCTTCCAGAAGTGCCCCATCTGCGAGAAGGTCATCCAGGGCGCCGGCAAG 
CTGCCGCGACACATCCGCACCCACACGGGCGAGAAGCCCTACGAGTGCAACATCTGCAAG 
GTCCGCTTCACCAGGCAGGACAAGCTGAAGGTGCACATGCGGAAGCACACGGGCGAGAAG 
CCGTACCTGTGCCAGCAGTGCGGCGCCGCCTTTGCCCACAACTACGACCTGAAGAACCAC 
ATGCGCGTGCACACGGGCCTGCGCCCCTACCAGTGCGACAGCTGCTGCAAGACCTTCGTC 
CGCTCCGACCACCTGCACAGACACCTCAAGAAAGACGGCTGCAACGGCGTCCCCTCGCGC 
CGCGGCCGCAAGCCCCGCGTCCGGGGCGGGGCGCCCGACCCCAGCCCGGGGGCCACCGCG 
ACCCCCGGCGCCCCCGCCCAGCCCAGCTCCCCCGACGCCCGGCGCAACGGCCAGGAGAAG 
CACTTTAAGGACGAGGACGAGGACGAGGACGTGGCCAGCCCCGACGGCTTGGGCCGGTTG 
AATGTAGCGGGCGCCGGTGGAGGAGGTGACAGCGGAGGTGGCCCCGGGGCCGCCACCGAC 
GGTAACTTCACAGCCGGACTCGCCTAA 



In a search of public sequence databases, the NOV59 nucleic acid sequence has 1271 
of 1560 bases (81%) identical to a gb:GENBANK-ID:AF097916|acc:AF097916.1 mRNA 
from Homo sapiens (Homo sapiens HIV-1 inducer of short transcripts binding protein (FBI1) 
mRNA, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV59 polypeptide (SEQ ID NO: 136) encoded by SEQ ID NO: 135 has 
548 amino acid residues and is presented in Table 59B using the one-letter amino acid code. 
Signal P 5 Psort and/or Hydropathy results predict that NOV59 has no signal peptide and is 
likely to be localized in the nucleus with a certainty of 0.8800. 



Table 59B, Encoded NOV59 protein sequence (SEQ ID NO: 136). 



MAGGVDGPIGIPFPDHSSDILSGLNEQRTQGLLCDWILVEGREFPTHRSVLAACSQYFK 
KLFTSGAVVDQQ]Sr\/YEIDFVSAEAL 

HV C ADLLDRQ I LAAD AG ADAGQLDLVDQ I DQRNLLRAKE YLE F FQ S NPMN S L P P AAAAAA 
ASFPWSAFGASDDDLDATKEAVAAAVAAVAAGDCNGLDFYGPGPPAERPPTGDGDEGDSN 
PGLWPERDEDAPTGGLFPPPVAPPAATQNGHYGRGGEEEAASLSEAAPEPGDSPGFLSGD 
SDEESRADDKGVMDYYLKYFSGAHDGDVYPAWSQKVEKKIRAKAFQKCPICEKVIQGAGK 
LPRHIRTHTGEKPYECNICKVRFTRQDKLKVHMRKHTGEKPYLCQQCGAAFAHNYDLKNH 
MRVHTGLRPYQCDSCCKTFVRSDHLHRHLKKDGCNGVPSRRGRKPRVRGGAPDPSPGATA 
TPGAPAQPSSPDARRNGQEKHFKDEDEDEDVAS PDGLGRLNVAGAGGGGDSGGGPGAATD 
GNFTAGLA 



A search of sequence databases reveals that the NOV59 amino acid sequence has 344 
of 458 amino acid residues (75%) identical to, and 362 of 458 amino acid residues (79%) 
similar to, the 584 amino acid residue ptnr:SPTREMBL-ACC:095365 protein from Homo 
sapiens (Human) (HIV-1 INDUCER OF SHORT TRANSCRIPTS BINDING PROTEIN). 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 
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NOV59 is expressed in at least tumor, inflammed, and brain. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV59 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 59C. 



Table 59C. BLAST results for NOV59 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


g i I / / u o j / o | ire r j in f u 
56982 . 1 | 
(NM_015898) 


Hiv-i inaucer or 
short transcripts 
binding protein 
[Homo sapiens] 


584 


r~ A Q / CO A 

b4 o / Do4 

(93%) 


d A Q / C Q A 

r>4o/ bo4 
(93%) 


e- 167 


gi | 16758916 | ref |NP 
446454 . 1 | 
(NM_054002) 


1 eukemi a /lymphoma 
related factor 
[Rattus 
norvegicusj 


569 


477/579 
(82%) 


489/579 
(84%) 


e-159 


gi | 6754572] ref |NP 0 
34861. l| 
(NM 010731) 


1 eukemi a /lymphoma 
related factor 
[Mus musculus] 


565 


479/579 
(82%) 


488/579 
(83%) 


e-125 


gi|3599513|gb|AAC35 
368.1) (AF086831) 


1 eukemi a/ lymohoma 
related factor 
cLRF [Gallus 
gallus] 


546 


359/578 
(62%) 


400/578 
(69%) 


4e-13 


gi | 214 5062 | gb | AAB58 
414. l| (AF000561) 


TTF-I interacting 
peptide 21; 
TIP21; 

Transcription 
Termination 
Factor I 
Interacting 
Peptide 21 [Homo 
sapiens] 


590 


293/297 
(98%) 


295/297 
(98%) 


4e-12 



Tables 59D-E list the domain descriptions from DOMAIN analysis results against 
NOV59. This indicates that the NOV59 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 59D. Domain Analysis of NOV59 

gnl 1 Pf am] pf amO 0651 , BTB, BTB/POZ domain. The BTB (for BR-C, ttk and 
bab) or POZ (for Pox virus and Zinc finger) domain is present near the 
N-terminus of a fraction of zinc finger (pfam00096) proteins and in 
proteins that contain the pfam01344 motif such as Kelch and a family 
of pox virus proteins. The BTB/POZ domain mediates homomeric 
dimerisation and in some instances heteromeric dimerisat ion . The 
structure of the dimerised PLZF BTB/POZ domain has been solved and 
consists of a tightly intertwined homodimer. The central scaffolding 
of the protein is made up of a cluster of alpha-helices flanked by 
short beta-sheets at both the top and bottom of the molecule. POZ 
domains from several zinc finger proteins have been shown to mediate 
transcriptional repression and to interact with components of histone 
deacetylase co-repressor complexes including N-CoR and SMRT. The POZ 
or BTB domain is also known as BR-C/Ttk or ZiN 

CD-Length = 114 residues, 100.0% aligned 

Score = 122 bits (305) , Expect = 7e-29 



Table 59E. Domain Analysis of NOV59 

gnl ] Pf am|pf amOQ096 , zf-C2H2, Zinc finger, C2H2 type. The C2H2 zinc 
finger is the classical zinc finger domain. The two conserved 
cysteines and histidines co-ordinate a zinc ion. The following pattern 
describes the zinc finger. #-X-C-X (1-5) -C-X3 -#-X5-#-X2 -H-X (3-6) - [H/C] 
Where X can be any amino acid, and numbers in brackets indicate the 
number of residues. The positions marked # are those that are 
important for the stable fold of the zinc finger. The final position 
can be either his or cys . The C2H2 zinc finger is composed of two 
short beta strands followed by an alpha helix. The amino terminal part 
of the helix binds the major groove in DNA binding zinc fingers. 

CD-Length = 23 residues, 100.0% aligned 

Score = 38.5 bits (88), Expect = 0.001 



The HIV-1 promoter directs the synthesis of two classes of transcripts, short, non- 
polyadenylated transcripts and full-length, polyadenylated transcripts. The synthesis of these 
transcripts is activated by a bipartite DNA element, the inducer of short transcripts or 1ST, 
located downstream of the HIV-1 transcriptional start site, while the synthesis of full-length 
transcripts is activated by the viral activator Tat. Tat binds to the RNA element TAR, which is 
encoded largely between the two 1ST half-elements. Upon activation by Tat, the synthesis of 
short RNAs is repressed. A factor called FBI-1 (for factor that binds to 1ST) whose binding to 
wild-type and mutated ISTs correlated well with the abilities of these ISTs to direct the 
synthesis of short transcripts was identified by Morrison, et al (See Morrison et al., Nucleic 
Acids Res 1999 Mar 1 : 1251-62 ). FBI-1 contains a POZ domain at its N-terminus and four 
Kruppel-type zinc fingers at its C-terminus. The C-terminus is sufficient for specific binding, 
and FBI-1 can form homomers through its POZ domain and, in vivo, through its zinc finger 
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domain as well. In addition, FBI-1 associates with Tat, suggesting that repression of the short 
transcripts by Tat may be mediated through interactions between the two factors. 

The disclosed NOV59 nucleic acid of the invention encoding a HIV-1 inducer of short 
transcripts binding protein like protein-like protein includes the nucleic acid whose sequence 
5 is provided in Table 59A or a fragment thereof. The invention also includes a mutant or 

variant nucleic acid any of whose bases may be changed from the corresponding base shown 
in Table 59A while still encoding a protein that maintains its HIV-1 inducer of short 
transcripts binding protein like protein-like activities and physiological functions, or a 
fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences 

10 are complementary to those just described, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

1 5 derivatized. These modifications are carried out at least in part to enhance the chemical 

stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 19 percent of the bases may be so changed. 

The disclosed NOV59 protein of the invention includes the HIV-1 inducer of short 

20 transcripts binding protein like protein-like protein whose sequence is provided in Table 59B. 
The invention also includes a mutant or variant protein any of whose residues may be changed 
from the corresponding residue shown in Table 59B while still encoding a protein that 
maintains its HIV-1 inducer of short transcripts binding protein like protein-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 

25 to about 25 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this HIV-1 inducer of 
short transcripts binding protein like protein-like protein (NOV59) may function as a member 

30 of a "HIV-1 inducer of short transcripts binding protein like protein family". Therefore, the 
NOV59 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
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targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV59 nucleic acids and proteins of the invention are useful in potential 
5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the HIV-1 inducer of short 
transcripts binding protein like protein-like protein (NOV59) may be useful in gene therapy, 
and the HIV-1 inducer of short transcripts binding protein like protein-like protein (NOV59) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 

1 0 example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from Human Immunodeficiency Virus/ Acquired immune deficiency syndrome, 
cancer, and inflammatory diseases, or other pathologies or conditions. The NOV59 nucleic 
acid encoding the HIV-1 inducer of short transcripts binding protein like protein-like protein 
of the invention, or fragments thereof, may further be useful in diagnostic applications, 

15 wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV59 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV59 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

20 NOVX Antibodies" section below. The disclosed NOV59 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

25 NOV60 

NOV60 includes three beta tectorin-like proteins disclosed below. The disclosed 
sequences have been named NOV60a, NOV60b and NOV60c. 

NOV60a 

A disclosed NOV60a nucleic acid of 101 1 nucleotides (also referred to as CG57574- 
30 01) encoding a beta tectorin-like protein is shown in Table 60 A. The start and stop codons are 
in bold letters. 



335 



iru rsa rii- ?S3! 

at lt *«iP "'U^VJ jUi^4 



Table 60 A. NOV60a nucleotide sequence (SEQ ID NO: 137). 

GATCGAGGCTCAGGCCCTGGAAGGACCGTAAACATTTGGCCAGCTTGGTTTGGATACCTG~ 

GCAGAGACCAGGTTCTGAGAAGCAA TGGTGACGAAGGCCTTTGTCTTGTTGGCCATCTTT 

GCAGAAGCCTCTGCAAAATCGTGTGCTCCAAATAAAGCAGATGTCATTCTTGTGTTTTGC 

TATCCCAAAACCATCATCACCAAAATCCCCGAGTGTCCCTATGGATGGGAAGTTCATCAG 

CTGGCCCTCGGAGGGCTGTGTTACAATGGGGTCCACGAAGGAGGTTACTACCAATTTGTG 

ATCCCAGATTTATCACCTAAAAACAAGTCCTATTGTGGAACCCAGTCTGAGTACAAGCCA 

CCTATCTATCACTTCTACAGTCACATCGTTTCCAATGACACCACAGTGATTGTAAAAAAC 

CAGCCTGTCAACTACTCCTTCTCCTGCACCTACCACTCCACCTACTTGGTGAACCAGGCT 

GCCTTTGACCAGAGTGTCAATTTCCTTCCAAAGAATGCCAAGTTCTCCATCAAGAAAGAA 

GCTCCCTTTGTCCTGGAGGCATCCGAAATCGGTTCAGATCTGTTTGCAGGAGTGGAAGCC 

AAAGGGTTAAGCATTAGGTTTAAAGTGGTCTTGAACAGCTGTTGGGCCACCCCCTCGGCT 

GACTTCATGTATCCCTTGCAGTGGCAGCTGATCAACAAGGGCTGCCCCACGGATGAAACC 

GTCCTCGTGCATGAGAATGGGAGAGATCACAGGGCAACCTTCCAATTCAATGCTTTCCGG 

TTCCAGAACATCCCCAAACTCTCCAAGGTGTGGTTACACTGTGAGACGTTCATCTGCGAC 

AGTGAGAAACTCTCCTGCCCAGTGACCTGCGATAAACGGAAGCGCCTCCTGCGAGACCAG 

ACCGGGGGAGTCCTGGTCGTGGAGCTCTCCCTGCGGAATGTTCTCCACCACCTCATCATG 

ATGTTGGGGATTTGTGCCGTGTTATAG GAGTTAGCCAGGCAGCTGCCGCTCCTCCACCCA 

CAATAG " ~~ ~ " " 



In a search of public sequence databases, the NOV60a nucleic acid sequence, located 
on chromsome 10 has 428 of 496 bases (86%) identical to a gbrGENBANK- 
ID:MMBETATEC|acc:X99806.2 mRNA from Mus musculus (Mus musculus mRNA for beta 
tectorin). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV60a polypeptide (SEQ ID NO: 138) encoded by SEQ ID NO: 137 
has 300 amino acid residues and is presented in Table 60B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV60a has a signal peptide and 
is likely to be localized plasma membrane with a certainty of 0.6850. The most likely cleavage 
site for a NOV60a peptide is between amino acids 1 7 and 1 8. 



Table 60B. Encoded NOV60a protein sequence (SEQ ID NO: 138). 

IWTKAFVLLAIFAEASAKSCAPNKADVILVFCYPKTIITKIPECPYGWEVHQLALGGIjCY 
NGVHEGGYYQFVT PDLS PKNKS YCGTQSE YKPP I YHFYSHI VSNDTTVI VKNQPVNYS FS 
CTYHSTYLVNQAAFDQSVNFLPKNAKFSIKKEAPFVLEASEIGSDLFAGVEAKGLSIRFK 
WLNSCWATPSADFMYPLQWQLINKGCPTDETVLVHENGRDHRATFQFNAFRFQNIPKLS 
KVWLHCETFICDSEKLSCPVTCDKRKRLLRDQTGGVLWELSLRNVLHHLIMMLGICAVL 



A search of sequence databases reveals that the NOV60a amino acid sequence has 149 
of 174 amino acid residues (85%) identical to, and 155 of 174 amino acid residues (89%) 
similar to, the 329 amino acid residue ptnr:SPTREMBL-ACC:O08524 protein from Mus 
musculus (Mouse) (BETA TECTORIN). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV60a is predicted to be expressed in the following tissues because of the expression 

pattern of (GENBANK-ID: gb:GENBANK-ID:MMBETATECjacc:X99806.2) a closely 
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related Mus musculus mRNA for beta tectorin homolog in species Mus musculus : cochleae. 
This information was derived by determining the tissue sources of the sequences that were 
included in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

NOV60b 

A disclosed NOV60b nucleic acid of 1012 nucleotides (also referred to as CG57574- 
02) encoding a beta tectorin-like protein is shown in Table 60C. The start and stop codons are 
in bold letters. 



10 



15 



20 



Table 60C. NOV60b nucleotide sequence (SEQ ID NO: 139). 



AGAGACCAGGTTCTGAGAAGCAATGGTGACGAAGGCCTTTGTCTTGTTGGCCATCTTTGCAGAAGCCTCT 
GCAAAATCGTGTGCTCCAAATAAAGCAGATGTCATTCTTGTGTTTTGCTATCCCAAT^ACCATCATCACCA 
AAATCCCCGAGTGTCCCTATGGATGGGAAGTTCATCAGCTGGCCCTCGGAGGGCTGTGTTACAATGGGGT 
CCACGAAGGAGGTTACTACCAATTTGTGATCCCAGATTTATCACCTAAAAACAAGTCCTATTGTGGAACC 
CAGTCTGAGTACAAGCCACCTATCTATCACTTCTACAGTCACATCGTTTCCAATGACGCCACAGTGATTG 
TAAAAAACCAGCCTGTCAACTACTCCTTCTCCTGCACCTACCACTCCACCTACTTGGTGAACCAGGCTGC 
CTTTGACCAGAGAGTGGCCACTGTTCACGTGAAGAACGGGAGCATGGGCACATTTGAGAGCCAACTGTCT 
CTCAACTTCTACACTAATGCCAAGTTCTCCATCAAGAAAGAAGCTCCCTTTGTCCTGGAGGCATCCGAAA 
T CGGTT CAGAT C TGTTTG CAGGAGTGGAAG C CAAAGGG TTAAG CATTAGGTTTAAAG TGG T C TTGAACAG 
CTGTTGGGCCACCCCCTCGGCTGACTTCATGTATCCCTTGCAGTGGCAGCTGATCAACAAGGGCTGCCCC 
ACGGATGAAACCGTCCTCGTGCATGAGAATGGGAGAGATCACAGGGCAACCTTCCAATTCAATGCTTTCC 
GGTTCCAGAACATCCCCAAACTCTCCAAGGTGTGGTTACACTGTGAGACGTTCATCTGCGACAGTGAGAA 
ACTCTCCTGCCCAGTGACCTGCGATAAACGGAAGCGCCTCCTGCGAGACCAGACCGGGGGAGTCCTGGTC 
GTGGAGCTCTCCCTGCGGAGCAGGGGATTTTCCAGTCTCTATAGCTTCTCAGATGTTCTCCACCACCTCA 
TCATGATGTTGGGGATTTGTGCCGTGTTATAG 



In a search of public sequence databases, the NOV60b nucleic acid sequence, located 
on chromsome 10 has 887 of 1012 bases (87%) identical to a gb:GENBANK- 
ID:MMBETATEC|acc:X99806.2 mRNA from Mus musculus (Mus musculus mRNA for beta 
tectorin). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV60b polypeptide (SEQ ID NO: 140) encoded by SEQ ID NO: 139 
has 329 amino acid residues and is presented in Table 60D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV60b has a signal peptide and 
is likely to be localized extracellularly with a certainty of 0.6850. The most likely cleavage site 
for a NOV60b peptide is between amino acids 17 and 18. 



Table 60D. Encoded NOV60b protein sequence (SEQ ID NO:140). 



MVTKAFVLLAI FAEASAKS CAPNKADVI LVFCYPKT IITKI PECPYGWEVHQLALGGLCY 
NGVHEGGYYQFVI PDLSPKNKS YCGTQSEYKPP I YHFYSHI VSNDATVIVKNQPVNYS FS 
CTYHS TYLVNQAAFDQRVATVHVKNGSMGTFES QLS LNFYTNAKFS IKKEAPFVLEAS E I 
GSDLFAGVEAKGLSIRFKVVLiNS CWATPSADFMYPLQWQLINKGCPTDETVLVHENGRDH 
RATFQFNAFRFQNIPKLSKVWLHCETFICDSEKLSCPVTCDKRKRLLRDQTGGVLWELS 
LRSRGFS SLYS FSDVLHHL IMMLG I CAVL 
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A search of sequence databases reveals that the NOV60b amino acid sequence has 
have 310 of 329 amino acid residues (94%) identical to, and 317 of 329 amino acid residues 
(96%) similar to, the 329 amino acid residue ptnr:SPTREMBL-ACC:O08524 protein from 
Mus musculus (Mouse) (BETA TECTORIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV60b is predicted to be expressed in at least the ear. This information was derived 
by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

NOV60c 

A disclosed NOV60c nucleic acid of 1012 nucleotides (also referred to as CG57574- 
02) encoding a beta tectorin-like protein is shown in Table 60E. The start and stop codons are 
in bold letters. 



Table 60E. NOV60c nucleotide sequence (SEQ ID NO: 141). 



AGAGACCAGGTTCTGAGAAGCAA TGGTGACGAAGGCCTTTGTCTTGTTGGCCATCTTTGC 
AGAAGCCTCTGCAAAATCGTGTGCTCCAAATAAAGCAGATGTCATTCTTGTGTTTTGCTA 
TCC C AAAAC CATC AT C AC C AAAATC C C CGAGTGTC CCT ATGGATGGGAAGTTC ATC AGCT 
GGCCCTCGGAGGGCTGTGTTACAATGGGGTCCACGAAGGAGGTTACTACCAATTTGTGAT 
CCCAGATTTATCACCTAAAAACAAGTCCTATTGTGGAACCCAGTCTGAGTACAAGCCACC 
TATCTATCACTTCTACAGTCACATCGTTTCCAATGACACCACAGTGATTGTAAAAAACCA 
GCCTGTCAACTACTCCTTCTCCTGCACCTACCACTCCACCTACTTGGTGAACCAGGCTGC 
CTTTGACCAGAGAGTGGCCACTGTTCACGTGAAGAACGGGAGCATGGGCACATTTGAGAG 
CCAACTGTCTCTCAACTTCTACACTAATGCCAAGTTCTCCATCAAGAAAGAAGCTCCCTT 
TGTCCTGGAGGCATCGGAAATCGGTTCAGATCTGTTTGCAGGAGTGGAAGCCAAAGGGTT 
AAGCATTAGGTTTAAAGTGGTCTTGAACAGCTGTTGGGCCACCCCCTCGGCTGACTTCAT 
GTATCCCTTGCAGTGGCAGCTGATCAACAAGGGCTGCCCCACGGATGAAACCGTCCTCGT 
GCATGAGAATGGGAGAGATCACAGGGCAACCTTCCAATTCAATGCTTTCCGGTTCCAGAA 
CATCCCCAAACTCTCCAAGGTGTGGTTACACTGTGAGACGTTCATCTGCGACAGTGAGAA 
ACTCTCCTGCCCAGTGACCTGCGATAAACGGAAGCGCCTCCTGCGAGACCAGACCGGGGG 
AGTCCTGGTCGTGGAGCTCTCCCTGCGGAGCAGGGGATTTTCCAGTCTCTATAGCTTCTC 
AGATGTTCTCCACCACCTCATCATGATGTTGGGGATTTGTGCCGTGTTATAG 



In a search of public sequence databases, the NOV60c nucleic acid sequence, located 
on chromsome 10 has 887 of 1012 bases (87%) identical to a gb:GENBANK- 
ID:MMBETATEC|acc:X99806.2 mRNA from Mus musculus (Mus musculus mRNA for beta 
tectorin). Public nucleotide databases include all GenBank databases and the GeneSeq patent 
database. 

The disclosed NOV60c polypeptide (SEQ ID NO: 142) encoded by SEQ ID NO:141 
has 329 amino acid residues and is presented in Table 60F using the one-letter amino acid 
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code. Signal P, Psort and/or Hydropathy results predict that NOV60c has a signal peptide and 
is likely to be localized plasma membrane with a certainty of 0.6850. The most likely cleavage 
site for a NOV60c peptide is between amino acids 17 and 18. 

Table 60F. Encoded NOV60c protein sequence (SEQ ID NO:142). 

MVTKAFVLLAIFAEASAKSCAPNKADVILVFCYPKTIITKIPECPYGWEVHQLALGGLCY 

NGVHEGGYYQFVIPDLSPKNKSYCGTQSEYKPPIYHFYSHIVSITOTTVIVKNQPVNYSFS 

CT YHS T YL VNQAA FDQRVAT VHVKNG S MGTFE S QL S LNF YTNAKFS I KKE AP FVLEAS E I | 

GSDLFAGVEAKGLSIRFKVVLNSCWATPSADFMYPLQWQLINKGCPTDETVLVHENGRDH 

RATFQFNAFRFQNIPKLSKVWLHCETFICDSEKLSCPVTCDKRKRLLRDQTGGVLVVELS 

LRS RGFSSLYS FSDVLHHL IMMLG I CAVL 



A search of sequence databases reveals that the NOV60c amino acid sequence has 3 1 0 
of 329 amino acid residues (94%) identical to, and 3 17 of 329 amino acid residues (96%) 
similar to, the 329 amino acid residue ptnr:SPTREMBL-ACC:O08524 protein from Mus 
musculus (Mouse) (BETA TECTORIN)Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV60c is predicted to be expressed in the following tissues because of the expression 
pattern of (GENBANK-ID: gb:GENBANK-ID:MMBETATEC|acc:X99806.2) a closely 
related Mus musculus mRNA for beta tectorin homolog in species Mus musculus xochleae. 
This information was derived by determining the tissue sources of the sequences that were 
included in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV60c polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 60G. 



Table 60G. BLAST results for NOV60c 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
{%) 


Positives 
(%) 


Expect 


gi ] 17158035 | ref | NP 
478129.1 | 
(NM_058222 


tectorin beta 
[Homo sapiens] 


329 


295/330 
(89%) 


297/330 
(89%) 


e-167 


gi | 7363457 | ref | NP 0 
33374 .2 | 
(NM_009348) 


tectorin beta; 
[b] -tectorin [Mus 
musculus] 


329 


280/330 
(84%) 


289/330 
(86%) 


e-159 


gi|l729889|sp|P5409 
7|TECB CHICK 


BETA- TECTORIN 
PRECURSOR 


329 


220/330 
(66%) 


260/330 
(78%) 


e-125 


qijl3385494 | ref | NP 
080265. 1| 
(NM 025989) 


RIKEN cDNA 
2310037118 [Mus 
musculus] 


531 


63/208 
(30%) 


99/208 
(47%) 


4e-13 
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qi|l2844889|dbj j BAB 


homolog to 


573 


62/200 


96/200 


4e-12 


26538. 1| (AK009843) 


PANCREATIC 
SECRETORY GRANULE 
MEMBRANE MAJOR 
GLYCOPROTEIN GP2 
PRECURSOR 
(PANCREATIC 
ZYMOGEN GRANULE 
MEMBRANE PROTEIN 
GP-2) 

(ZAP7 5) -putative 
[Mus musculus] 




(31%) 


(48%) 





Table 60H lists the domain descriptions from DOMAIN analysis results against 
NOV60c. This indicates that the NOV60c sequence has properties similar to those of other 
proteins known to contain this domain. 

5 



Table 60H. Domain Analysis of NOV60c 

gnl|Smartjsmart00241 , ZP, Zona pellucida (ZP) domain; ZP proteins 
are responsible for sperm-adhesion fo the zona pellucida. ZP domains are also 
present in multidomain transmembrane proteins such as glycoprotein GP2, 
uromodulin and TGF-beta receptor type III (betaglycan). 

CD-Length = 253 residues, 98.0% aligned 

Score = 119 bits (297) , Expect = 3e-28 



Legan et al. (1997) cloned mouse alpha- and beta-tectorins. The mouse beta-tectorin 
gene encodes a 320-amino acid protein containing a hydrophobic secretory signal sequence 
and 4 potential N-glycosylation sites. Both alpha- and beta-tectorin contain a zona pellucida 
10 domain, but otherwise are not homologous. 

To identify genes expressed in the vertebrate inner ear, Heller et al. (1998) established 
an assay that allowed rapid analysis of the differential expression pattern of mRNAs derived 
from an auditory epithelium-specific cDNA library. They performed subtractive hybridization 
to create an enriched probe, which was then used to screen the cDNA library. After 

1 5 digoxigenin-labeled antisense cRNAs had been transcribed from hybridization-positive clones, 
they conducted in situ hybridization on slides bearing cryosections of late embryonic chicken 
heads, bodies, and cochleas. They found 12 proteins whose mRNAs were specifically or 
highly expressed in the chicken's inner ear; the remainder encoded proteins that occur more 
widely. They identified proteins that had previously been described as expressed in the inner 

20 ear, such as beta-tectorin, calbindin (CALB1; 1 14050), and type II collagen (COL2A1; 

120140). A second group of proteins abundant in the inner ear included 5 additional types of 
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collagen. A third group, including COCH5B2 (COCH; 603196) and ear-specific connexin, 
comprised the proteins whose human equivalents are candidates to account for hearing 
disorders. This last group also included proteins expressed in 2 cells types unique to the inner 
ear, homogene cells and cells of the tegmentum vasculosum. 

5 The disclosed NOV60c nucleic acid of the invention encoding a beta tectorin-like 

protein includes the nucleic acid whose sequence is provided in Table 60A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 60A while still encoding a protein 
that maintains its beta tectorin-like activities and physiological functions, or a fragment of 

1 0 such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 

1 5 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 3 percent of the bases may be so changed. 

20 The disclosed NOV60 protein of the invention includes the beta tectorin-like protein 

whose sequence is provided in Table 60B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
60B while still encoding a protein that maintains its beta tectorin-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 

25 to about 6 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this beta tectorin-like 
protein (NOV60) may function as a member of a "beta tectorin family". Therefore, the 

30 NOV60 nucleic acids and proteins identified here may be useful in potential therapeutic 

applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
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delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV60 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
5 and disorders as indicated below. For example, a cDNA encoding the beta tectorin-like 

protein (NOV60) may be useful in gene therapy, and the beta tectorin-like protein (NOV60) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 
example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from hearing loss, or other pathologies or conditions. The NOV60 nucleic acid 

10 encoding the replacement-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV60 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV60 substances for use in 

1 5 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV60 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

20 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV61 

A disclosed NOV61 nucleic acid of 3802 nucleotides (also referred to as CG57505- 
01) encoding a KIAA1 125-like protein is shown in Table 61 A. The start and stop codons are 
25 in bold letters. 



Table 61 A. NOV61 nucleotide sequence (SEQ ID NO: 143). 

gggcagccaatcggggatgagcttttattaggcggccagatcatcagccgaagtgccaaa " 

ccctttttctgtgagaactaggagcctgtcctccatgttttataagtattgacattacac 

agtgttaacaa tgcatccacagagcttggctgaagaggaaataaaaacagaacaggaggt 

ggtagagggcatggatatctctactcgctccaaagatcctggctctgcagagagaacagc 

ccagaaaagaaagttccccagccctccacattcttccaatggccactcgccgcaggacac 

atcaacaagccccattaaaaagaaaaagaaacctggcttactgaacagtaacaataagga 

gcagtcagaactaagacatggtccgttttactatatgaagcagccactcaccacagaccc 

tgttgatgttgtaccgcaggatggacggaatgatttctactgctgggtttgtcaccggga 

aggccaagtcctttgctgtgagctctgtccccgggtttatcacgctaagtgtctgagact 

gacatcggaaccagagggggactggttttgtcctgaatgtgagaaaattacagtagcaga 

atgcatcgagacccagagtaaagccatgacaatgctcaccattgaacagttatcctacct 
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GCTCAAGTTTGCCATTCAGAAAATGAAACAGCCAGGGACAGATGCATTCCAGAAGCCCGT 
TCCATTGGAACAGCACCCTGACTATGCGGAATACATCTTCCATCCAATGGACCTTTGTAC 
ATTGGAAAAGAATGCGAAAAAGAAAATGTATGGCTGCACAGAAGCCTTCCTGGCTGATGC 
AAAGTGGATTTTGCACAACTGCATCATTTATAATGGGGGAAATCACAAATTGACGCAAAT 
AGCGAAAGTAGTCATCAAAATCTGTGAACATGAGATGAATGAAATCGAAGTATGTCCAGA 
ATGTTATCTAGCTGCTTGCCAAAAACGAGATAACTGGTTTTGTGAGCCTTGTAGCAATCC 
ACATCCTTTGGTCTGGGCCAAACTGAAGGGGTTTCCATTCTGGCCTGCAAAAGCTCTAAG 
GGATAAAGACGGGCAGGTCGATGCCCGATTCTTTGGACAACATGACAGGGCCTGGGTTCC 
AATAAATAATTGCTACCTCATGTCTAAAGAAATTCCTTTTTCTGTGAAAAAGACTAAGAG 
CATCTTCAACAGTGCCATGCAAGAGATGGAGGTTTACGTGGAGAACATCCGCAGGAAGTT 
TGGGGTTTTTAATTACTCTCCATTTAGGACACCCTACACACCCAACAGCCAGTATCAAAT 
GCTGCTCGATCCCACCAACCCCAGCGCCGGCACTGCCAAGATAGACAAGCAGGAGAAGGT 
CAAGCTCAACTTTGACATGACGGCATCCCCCAAGATCCTGATGAGCAAGCCTGTGCTGAG 
TGGGGGCACAGGCCGCCGGATTTCCTTGTCGGATATGCCGCGCTCCCCCATGAGCACAAA 
CTCTTCTGTGCACACGGGCTCCGACGTGGAGCAGGATGCTGAGAAGAAGGCCACGTCGAG 
CCACTTCAGTGCGAGCGAGGAGTCCATGGACTTCCTGGATAAGAGCACAGCTTCACCAGC 
CTCCACCAAGACGGGACAAGCAGGGAGTTTATCCGGCAGCCCAAAGCCCTTCTCTCCTCA 
ACTGTCAGCTCCTATCACGACGAAAACGGACAAAACCTCCACCACCGGCAGCATCCTGAA 
TCTTAACCTGGATCGAAGCAAAGCTGAGATGGATTTGAAGGAGCTGAGCGAGTCGGTCCA 
GCAACAGTCCACCCCTGTTCCTCTCATCTCTCCCAAGCGCCAGATTCGTAGCAGGTTCCA 
GCTGAATCTTGACAAGACCATAGAGAGTTGCAAAGCACAATTAGGCATAAATGAAATCTC 
GGAAGATGTCTATACGGCCGTAGAGCACAGCGATTCGGAGGATTCTGAGAAGTCAGATAG 
TAGCGATAGTGAGTATATCAGTGATGATGAGCAGAAGTCTAAGAACGAGCCAGAAGACAC 
AGAGGACAAAGAAGGTTGTCAGATGGACAAAGAGCCATCTGCTGTTAAAAAAAAGCCCAA 
GCCTACAAACCCAGTGGAGATTAAAGAGGAGCTGAAAAG CACGTCAC CAGC C AGCGAGAA 
GGCAGACCCTGGAGCAGTCAAGGACAAGGCCAGCCCTGAGCCTGAGAAGGACTTTTCCGA 
AAAGGCAAAACCTTCACCTCACCCCATAAAGGATAAACTGAAGGGAAAAGATGAGACGGA 
TTCCCCAACAGTCCATTTGGGCCTGGACTCTGATTCAGAGAGCGAACTTGTCATAGATTT 
AGGAGAAGACCATTCTGGGCGGGAGGGTCGAAAAAATAAGAAGGAACCCAAAGAACCATC 
TCCCAAACAGGATGTTGTAGGTAAAACTCCACCATCCACGACGGTGGGCAGCCATTCTCC 
CCCGGAAACACCGGTGCTCACCCGCTCTTCCGCCCAAACTTCCGCGGCTGGCGCCACAGC 
CACCACCAGCACGTCCTCCACGGTCACCGTCACGGCCCCGGCCCCCGCCGCCACAGGAAG 
CCCAGTGAAAAAGCAGAGGCCGCTTTTACCGAAGGAGACTGCCCCGGCCGTGCAGCGGGT 
CGTGTGGAACTCATCAAGTAAGTTTCAAACGTCCTCCCAAAAGTGGCACATGCAGAAGAT 
GCAGCGTCAGCAGCAGCAGCAGCAGCAGCAAAACCAGCAGCAGCAGCCTCAGTCTTCCCA 
GGGGACGAGATATCAGACCAGACAGGCTGTGAAAGCTGTCCAGCAGAAGGAGATCACACA 
GAGCCCATCCACGTCCACCATCACCCTGGTGACCAGCACACAGTCATCGCCCCTGGTCAC 
CAGCTCGGGGTCCATGAGCACCCTTGTGTCCTCAGTCAACGCTGACCTGCCCATCGCCAC 
TGCCTCAGCTGATGTCGCCGCTGATATTGCCAAGTACACTAGCAAAATGATGGATGCAAT 
AAAAGGAACAATGACAGAAATATACAACGATCTTTCTAAAAACACTACTGGAAGCACAAT 
AGCTGAGATTCGCAGGCTGAGGATCGAGATAGAGAAGCTCCAGTGGCTGCACCAGCAAGA 
GCTCTCCGAAATGAAACACAACTTAGAGCTGACCATGGCGGAGATGCGGCAGAGCCTGGA 
GCAGGAGCGGGACCGGCTCATCGCCGAGGTGAAGAAGCAGCTGGAGTTGGAGAAGCAGCA 
GGCGGTGGATGAGACCAAGAAGAAGCAGTGGTGCGCCAACTGCAAGAAGGAGGCCATCTT 
TTACTGCTGTTGGAACACTAGCTACTGTGACTACCCCTGCCAGCAAGCCCACTGGCCTGA 
GCACATGAAGTCCTGCACCCAGTCAGCTACTGCTCCTCAGCAGGAAGCGGATGCTGAGGT 
GAACACAGAAACACTAAATAAGTCCTCCCAGGGGAGCTCCTCGAGCACACAATCAGCACC 
TTCAGAAACGGCCAGCGCCTCCAAAGAGAAGGAGACGTCAGCTGAGAAAAGCAAGGAGAG 
TGGCTCGACCCTTGACCTTTCTGGCTCCAGAGAGACGCCCTCCTCCATTCTCTTAGGCTC 
CAACCAAGGCTCTGACCATTCCCGGAGTAATAAATCCAGTTGGAGCAGCAGTGATGAGAA 
GAGGGGATCGACACGTTCCGATCACAACACCAGTACCAGCACGAAGAGCCTCCTCCCGAA 
AGAGTCTCGGCTGGACACCTTCTGGGACTA GCAGTGAATCGGGACACAAACCACCCACCC 
CATTGGGAGAAAAACC CAGACGC CAGGAAAAGAAGAAAC AAC AAAGGC AGGAGAAC AGC C 
ACTTTCAGACTTGAAAATGACAAAACCCTCAGTTGAGCCTGAGCCCCCGGCGCGGGGGCT 
GCTACACTA 



In a search of public sequence databases, the N0V61 nucleic acid sequence, located on 
chromsome 20 has 3797 of 3827 bases (99%) identical to a gbrGENBANK- 
ID:AB03295l|acc:AB03295Ll mRNA from Homo sapiens (Homo sapiens mRNA for 
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KIAA1 125 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV61 polypeptide (SEQ ID NO: 144) encoded by SEQ ID NO: 143 has 
1206 amino acid residues and is presented in Table 61 B using the one-letter amino acid code. 
5 Signal P, Psort and/or Hydropathy results predict that NOV61 has no signal peptide and is 
likely to be localized in the nucleus with a certainty of 0.7000. 



Table 61B- Encoded NOV61 protein sequence (SEQ ID NO:144). 

MHPQSLAEEEIKTEQEVVEGMD I STRSKDPGS AERTAQKRKFPS PPHS SNGHSPQDTSTS 
PIKKKKKPGLLNSNNKEQSELRHGPFYYMKQPLTTDPVDWPQDGRNDFYCWVCHREGQV 
LCCELCPRVYHAKCLRLTSEPEGDWFCPECEKITVAECIETQSKAMTMLTIEQLSYLLKF 
AIQKMKQPGTDAFQKPVPLEQHPDYAEYIFHPMDLCTLEKNAKKKMYGCTEAFIjADAKWI 
LHNCIIYNGGNHKLTQIAKWIKI^ 

VWAKLKGFPFWPAKALRDK^GQVDARFFGQHDRAWPIIOTCYLMSKEIPFSVKKTKSIFN 
SAMQEMEVYVENIRRKFGVFNYSPFRTPYTPNSQYQMLLDPTNPSAGTAKIDKQEKVKLN 
FDMTASPKILMSKPVLSGGTGRRISLSDMPRSPMSTNSSVHTGSDVEQDAEKKATSSHFS 
ASEESMDFLDKSTASPASTKTGQAGSLSGSPKPFSPQLSAPITTKTDKTSTTGSILNLNL 
DRSKAEMDLKELSESVQQQSTPVPLISPKRQIRSRFQLNLDKTIESCKAQLGINEISEDV 
YTAVEHSDSEDSEKSDSSDSEYISDDEQKSKNEPEDTEDKEGCQMDKEPSAVKKKPKPTN 
PVEIKEELKSTSPASEKADPGAVKDKASPEPEKDFSEKAKPSPHPIKDKLKGKDETDSPT 
VHLGLDSDSESELVIDLGEDHSGREGRKNKKEPKEPSPKQDWGKTPPSTTVGSHSPPET 
PVLTRSSAQTSAAGATATTSTSSTVTVTAPAPAATGSPVKKQRPLLPKETAPAVQRWWN 
SSSKFQTSSQKWHMQKMQRQQQQQQQQNQQQQPQSSQGTRYQTRQAVKAVQQKEITQSPS 
TST I TLVT STQS S PLVTS SG SMSTLVS SVNADLP I ATAS ADVAAD I AKYT S KMMDAI KGT 
MTEIY1STOLSKNTTGSTIAEIRRLRIEIEKLQWLHQQELSEMKHNLELTMAEMRQSLEQER 
DRLIAEVKKQLELEKQQAVDETKKKQWCANCKKEAIFYCCWNTSYCDYPCQQAHWPEHMK 
SCTQSATAPQQEADAEVNTETLNKSSQGSSSSTQSAPSETASASKEKETSAEKSKESGST 
LDLSGSRETPSSILLGSNQGSDHSRSNKSSWSSSDEKRGSTRSDHNTSTSTKSLLPKESR 
LDTFWD 



A search of sequence databases reveals that the NOV61 amino acid sequence has 1201 
10 of 1201 amino acid residues (100%) identical to, and 1201 of 1201 amino acid residues 

(100%) similar to, the 1205 amino acid residue ptnr:SPTREMBL-ACC:Q9ULU4 protein from 
Homo sapiens (Human) (KIAA1 125 PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV61 is expressed in at least Brain, Lymphoid tissue, Kidney, Whole Organism, 
1 5 Bone Marrow, Prostate, Lung, Lung Pleura, Retina. This information was derived by 

determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV61 polypeptide has homology to the amino acid sequences shown 
20 in the BLASTP data listed in Table 61C. 
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Table 61 C. BLAST results for NOV61 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi| 6329749 |dbi | BAA8 
6439.11 (AB032951) 


KIAA1125 protein 
[Homo sapiens] 


1205 


1201/1201 
(100%) 


1201/1201 
(100%) 


0.0 


gi | 17980969 |gb|AAL5 
0790.1|AF454056 1 


sel4-3r protein 
[Homo sapiens] 


995 


992/1041 
(95%) 


994/1041 
(95%) 


0.0 


(AF454056) 


gi|l4786224|ref |XP 
(XM_012932) 


similar to 
protein kinase C 
binding protein 1 
(H. sapiens) 
[Homo sapiens] 


949 


932/933 
(99%) 


933/933 
(99%) 


0.0 


gi | 11385648 j qb 1 AAG3 
4905. 1 |AF273045 1 


CTCL tumor 
antigen sel4-3 
[Homo sapiens] 


764 


762/810 
(94%) 


763/810 
(94%) 


0.0 


(AF273045) 


gi | 13 6 77199 | emb | CAC 
19781. 1| (AL031666 


dJ569M23.1. 
1 (protein kinase C 
binding protein 1, 

isoform 1 
(KIAA1125)) [Homo 
sapiens] 


521 


520/520 
(100%) 


520/520 
(100%) 


0 0 





Table 61D-F lists the domain descriptions from DOMAIN analysis results against 
NOV61 . This indicates that the NOV61 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 61D. Domain Analysis of NOV61 

gnl [Smart |smart00297 , BROMO, bromo domain 

CD-Length = 109 residues, 94.5% aligned 

Score = 81.3 bits (199), Expect = 3e-16 



Table 61E. Domain Analysis of NOV61 

crnl]Pfam[pfam00628, PHD, PHD- finger. PHD folds into an interleaved 
type of Zn- finger chelating 2 Zn ions in a similar manner to that of 
the RING and FYVE domains. 

CD-Length = 49 residues, 93.9% aligned 

Score =56.6 bits (135), Expect = 8e-09 
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Table 61F. Domain Analysis of NOV61 

gnl|Pfamlpfam01753 , zf-MYND, MYND finger. 
CD-Length = 38 residues, 100.0% aligned 

Score =45.1 bits (105), Expect = 2e-05 

Mukai and Ono (1994) isolated a cDNA for a protein kinase, designated PKN by them, 
from a human hippocampus cDNA library. The putative 942-amino acid protein has leucine 
zipper-like sequences at its amino terminus and contains a domain with strong similarity to 
that of the protein kinase C family. Ubiquitous expression in human tissues was shown. 
Antisera detected a 120-kD recombinant y expressed protein on Western blots. The protein 
showed intrinsic protein kinase activity that was abolished by a mutation in the predicted ATP 
binding site. 

Palmer et al. (1994) used degenerate PCR to isolate 3 novel members of the closely 
related protein kinase C (PKC) family, termed PRK1, PRK2 (602549), and PRK3. Palmer et 
al. (1995) cloned a full-length cDNA of PRK1 from a human fetal brain library. Using 
Northern blot and RT-PCR analyses Palmer et al. (1995) detected expression of PRK1 in all 
tissues and cell lines tested. 

In a study of proteins that bind to the rho GTPase (see Ridley and Hall, 1992), Amano 
et al. (1996) discovered 1 protein that had partial amino acid sequences identical to PKN. They 
found that rho binds directly to a polybasic region of the N-terminal regulatory domain that 
precedes the leucine zipper-like motif. The authors speculated that through this activity, PKN 
may mediate the rho-dependent signaling pathway. 

Bartsch et al. (1998) used fluorescence in situ hybridization to map the PRKCL1 gene 
to 19pl3.1-pl2 and radiation hybrid mapping to localize the gene in subband 19pl2. By 
segregation analysis, they mapped the corresponding mouse gene (Prkcll) to chromosome 8. 

The disclosed NOV nucleic acid of the invention encoding a KIAA1 125-like protein 
includes the nucleic acid whose sequence is provided in Table 61 A or a fragment thereof. 
The invention also includes a mutant or variant nucleic acid any of whose bases may be 
changed from the corresponding base shown in Table 61 A while still encoding a protein that 
maintains its KIAA1 125-like activities and physiological functions, or a fragment of such a 
nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
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chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 percent of the bases may be so changed. 

The disclosed NOV61 protein of the invention includes the KIAA1 125-like protein 
whose sequence is provided in Table 6 IB. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
61B while still encoding a protein that maintains its KIAA1 125-like activities and 
physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 0 percent of the residues may be so changed. 

The invention further encompasses antibodies=and antibody fragments, such as F ab or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this KIAA1 125-like 
protein (NOV61) may function as a member of a "KIAA1 125 family". Therefore, the NOV61 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV61 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the KIAA1 125-like protein 
(NOV61) may be useful in gene therapy, and the KIAA1 125-like protein (NOV61) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from Cardiomyopathy, Atherosclerosis, Hypertension, Congenital heart defects, Aortic 
stenosis , Atrial septal defect (ASD), Atrioventricular (A-V) canal defect, Ductus arteriosus , 
Pulmonary stenosis , Subaortic stenosis, Ventricular septal defect (VSD), valve diseases, 
Tuberous sclerosis, Scleroderma, Obesity, Transplantation, Aneurysm, Fibromuscular 
dysplasia, Stroke, Anemia , Bleeding disorders, Adrenoleukodystrophy , Congenital Adrenal 
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Hyperplasia, Diabetes, Von Hippel-Lindau (VHL) syndrome , Pancreatitis, 
Hyperparathyroidism, Hypoparathyroidism, SIDS, Endometriosis, Fertility, Xerostomia, 
Hypercalceimia, Ulcers, Cirrhosis, Inflammatory bowel disease, Diverticular disease, 
Hirschsprung's disease , Crohn's Disease, Appendicitis, Hemophilia, hypercoagulation, 
5 Idiopathic thrombocytopenic purpura, autoimmume disease, allergies, immunodeficiencies, 
Graft vesus host, Ataxia-telangiectasia, Hemophilia, Lymphedema, Tonsilitis, Osteoporosis, 
Arthritis, Ankylosing spondylitis, Scoliosis, Tendinitis, Muscular dystrophy, Lesch-Nyhan 
syndrome, Myasthenia gravis, Dental disease and infection, Alzheimer's disease, Tuberous 
sclerosis, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy, Lesch-Nyhan 

10 syndrome, Multiple sclerosis, Ataxia-telangiectasia, Leukodystrophies, Behavioral disorders, 
Addiction, Anxiety, Pain, Neuroprotection, Growth and reproductive disorders, Endocrine 
dysfunctions, Systemic lupus erythematosus , Asthma, Emphysema, ARDS, Pharyngitis, 
Laryngitis, Hearing loss, Tinnitus, Psoriasis, Actinic keratosis ,Tuberous sclerosis, Acne, Hair 
growth, allopecia, pigmentation disorders, cystitis, incontinence, Renal artery stenosis, 

1 5 Interstitial nephritis, Glomerulonephritis, Polycystic kidney disease, Systemic lupus 

erythematosus, Renal tubular acidosis, IgA nephropathy, Vesicoureteral reflux, glaucoma, 
blindness, and Hypothyroidism, or other pathologies or conditions. The NOV61 nucleic acid 
encoding the KIAA1 125-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 

20 protein are to be assessed. 

NOV61 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV61 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

25 NOVX Antibodies" section below. The disclosed NOV61 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

30 NOV62 

NOV 62 includes two zinc finger BOP-like proteins disclosed below. The disclosed 
sequences have been named NOV62a and NOV62K 
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NOV62a 

A disclosed NOV 62a nucleic acid of 1629 nucleotides (also referred to as CG57473- 
01) encoding a zinc finger BOP-like protein is shown in Table 62 A. The start and stop codons 
are in bold letters. 



Table 62 A. NOV62a nucleotide sequence (SEQ ID NO: 145). 



TATAGTCTTGCTCTTTGGGATGCTG7^AGGTGCTGAAATAGCAATGACAAGAGACTTGGCT 
CAGTGTTAAATAACTGCCGCGCTGGCCTGACAGTCTCTGAG ATGACAATAGGGAGAATGG 
AGAACGTGGAGGTCTTCACCGCTGAGGGCAAAGGAAGGGGTCTGAAGGCCACCAAGGAGT 
TCTGGGCTGCAGATATCATCTTTGCTGAGCGGGCTTATTCCGCAGTGGTTTTTGACAGCC 
TTGTTAATTTTGTGTGCCACACCTGCTTCAAGAGGCAGGAGAAGCTCCATCGCTGTGGGC 
AGTGCAAGTTTGCCCATTACTGCGACCGCACCTGCCAGAAGGATGCTTGGCTGAACCACA 
AGAATGAATGTTCGGCC AT C AAGAGATATGGGAAGGTGC C CAATGAGAAC ATC AGGCTGG 
CGGCGCGCATCATGTGGCGGGTGGAGAGAGAAGGCACCGGGCTCACGGAGGGCTGCCTGG 
TGTCCGTGGACGACTTGCAGAACCACGTGGAGCACTTTGGGGAGGAGGAGCAGAAGGACC 
TGCGGGTGGACGTGGACACATTCTTGCAGTACTGGCCGCCGCAGAGCCAGCCGTTCAGCA 
TGCAGTACATCTCGCACATCTTCGGAGTGATTAACTGCAACGGTTTTACTCTCAGTGATC 
AGAGAGGCCTGCAGGCCGTGGGCGTAGGCATCTTCCCCAACCTGGGCCTGGTGAACCATG 
ACTGTTGGCCCAACTGTACTGTCATATTTAACAATGGCAATCATGAGGCAGTGAAATCCA 
TGTTTCATACCCAGATGAGAATTGAACTGCGGGCCCTAGGCAAGATCTCAGAAGGAGAGG 
AGCTGACTGTGTCCTATATCGACTTCCTCAACGTTAGTGAAGAACGCAAGAGGCAGCTGA 
AGAAGCAGTACTACTTTGACTGCACATGTGAACACTGCCAGAAAAAACTGT^AGGATGACC 
TCTTCCTGGGGGTGAAAGACAACCCCAAGCCCTCTCAGGAAGTGGTGAAGGAGATGATAC 
AATTCTCCAAGGATACATTGGAAAAGATAGACAAGGCTCGTTCCGAGGGTTTGTATCATG 
AGGTTGTGAAATTATGCCGGGAGTGCCTGGAGAAGCAGGAGCCAGTGTTTGCTGACACCA 
ACATCTACATGCTGCGGATGCTGAGCATTGTTTCGGAGGTCCTTTCCTACCTCCAGGCCT 
TTGAGGAGGCCTCGTTCTATGCCAGGAGGATGGTGGACGGCTATATGAAGCTCTACCACC 
CCAACAATGCCCAACTGGGCATGGCCGTGATGCGGGCAGGGCTGACCAACTGGCACGCTG 
GTAACATTGAGGTGGGGCACGGGATGATCTGCAAAGCCTATGCCATTCTCCTGGTGACAC 
ACGGACCCTCCCACCCCATCACTAAGGACTTAGAGGCCATGCGGGTGCAGACGGAGATGG 
AGCTACGCATGTTCCGCCAGAACGAATTCATGTACTACAAGATGCGCGAGGCTGCCCTGA 
ACAACCAGCCCATGCAGGTCATGGCCGAGCCCAGCAATGAGCCATCCCCAGCTCTGTTCC 
ACAAGAAGCAATG AGGACTGCCCAGTGGAGGAGGGGCGATGTGGCTGGGGAGCTAGGGAG 
AGACTCTGG 



In a search of public sequence databases, the NOV62a nucleic acid sequence, located 
on chromsome 2 has 1392 of 1573 bases (88%) identical to a gb:GENBANK- 
ID:MMU76373|acc:U76373.2 mRNA from Mus musculus (Mus musculus skm-BOPl (Bop) 
mRNA, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV62a polypeptide (SEQ ID NO: 1 46) encoded by SEQ ID NO: 1 45 
has 490 amino acid residues and is presented in Table 62B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV62a has no signal peptide 
and is likely to be localized in the cytoplasm with a certainty of 0.6500. 



Table 62B. Encoded NOV62a protein sequence (SEQ ID NO:146). 



MT I GRMENVE VFTAEGKGRGLKATKE FWAAD 1 1 F AERAYS AWFDS LVNFVCHTC FKRQE 
KLHRCGQCKFAHYCPRTCQKDAWLNHKNECSAIKRYGKVPNENIRLAARIMWRVEREGTG 
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LTEGCLVSVDDLQNHVEHFGEEEQKDLiRVDVDTFLQYWPPQSQPFSMQYISHIFGVINCN 
GFTLSDQRGLQAVGVGIFPNLGLVNHDCWPNCTVIFNNGNHEAVKSMFHTQMRIELRALG 
KISEGEELTVSYIDFLNVSEERKRQLKKQYYFDCTCEHCQKKLKDDLFLGVKDNPKPSQE 
WKEMIQFSKDTLEKIDKARSEGLYHEWKLCRECLEKQEPVFADTNIYMLRMLSIVSEV 
LSYLQAFEEAS FYARRMVDGYMKLYHPNNAQLGMAVMRAGLTNWHAGNIEVGHGMICKAY 
AILLVTHGPSHPITKDLEAMRVQTEMELRMFRQNEFMYYKMREAALNNQPMQVMAEPSNE 
PSPALFHKKQ 



A search of sequence databases reveals that the NOV62a amino acid sequence has 458 
of 485 amino acid residues (94%) identical to, and 478 of 485 amino acid residues (98%) 
similar to, the 485 amino acid residue ptnr:SPTREMBL-ACC:P97443 protein from Mus 
5 musculus (Mouse) (ZINC-FINGER PROTEIN BOP). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV62a is expressed in at least Whole Organism, Heart, Lung, Prostate, Skeletal 
Muscle. This information was derived by determining the tissue sources of the sequences that 
were included in the invention including but not limited to SeqCalling sources, Public EST 
10 sources, Literature sources, and/or RACE sources. 



NOV62b 

A disclosed NOV62b nucleic acid of 1555 nucleotides (also referred to as CG57473- 
01) encoding a zinc finger BOP-like protein is shown in Table 62C. The start and stop codons 
are in bold letters. 



Table 62C. NOV62b nucleotide sequence (SEQ ID NO:147). 

AAGGTGCTGAAATAGCAATGACAAGAGACTTAGCTCAGTGTTAAATAACTGCCGCGCTGG 
CCTGACAGTTTCTGAGA TGACAATAGGGAGAATGGAGAACGTGGAGGTCTTCACCGCTGA 
GGGCAAAGGAAGGGGTCTGAAGGCCACCAAGGAGTTCTGGGCTGCAGATATCATCTTTGC 
TGATCGGGCTTATTCCGCAGTGGTTTTTGACAGCCTTGTTAATTTTGTGTGCCACACCTG 
CTTCAAGAGGCAGGAGAAGCTCCATCGCTGTGGGCAGTGCAAGTTTGCCCATTACTGCGA 
CCGCACCTGCCAGAAGGATGCTTGGCTGAACCACAAGAATGAATGTTCGGCCATCAAGAG 
ATATGGGAAGGTGCCCAATGAGAACATCAGGCTGGCGGCGCGCATCATGTGGAGGGTGGA 
GAGAGAAGGCACCGGGCTCACGGAGGGCTGCCTGGTGTCCGTGGACGACTTGCAGAACCA 
CGTGGAGCACTTTGGGGAGGAGGAGCAGAAGGACCTGCGGGTGGACGTGGACACATTCTT 
GCAGTACTGGCCGCCGCAGAGCCAGCAGTTCAGCATGCAGTACATCTCGCACATCTTCGG 
AGTGATTAACTGCAACGGTTTTACTCTCAGTGATCAGAGAGGCCTGCAGGCCGTGGGCGT 
AGGCATCTTCCCCAACCTGGGCCTGGTGAACCATGACTGTTGGCCCAACTGTACTGTCAT 
ATTTAACAATGGCAATCATGAGGCAGTGAAATCCATGTTTCATACCCAGATGAGAATTGA 
GCTCCGGGCCCTAGGCAAGATCTCAGAAGGAGAGGAGCTGACTGTGTCCTATATTGACTT 
CCTCAACGTTAGTGAAGAACGCAAGAGGCAGCTGAAGAAGCAGTACTACTTTGACTGCAC 
ATGTGAACACTGCCAGAAAAAACTGAAGGATGACCTCTTCCTGGGGGTGAAAGACAACCC 
CAAGCCCTCTCAGGAAGTGGTGAAGGAGATGATACAATTCTCCAAGGATACATTGGAAAA 
GATAGACAAGGCTCGTTCCGAGGGTTTGTATCATGAGGTTGTGAAATTATGCCGGGAGTG 
CCTGGAGAAGCAGGAGCCAGTGTTTGCTGACACCAACATCTACATGCTGCGGATGCTGAG 
CATTGTTTCGGAGGTCCTTTCCTACCTCCAGGCCTTTGAGGAGGCCTCGTTCTATGCCAG 
GAGGATGGTGGACGGCTATATGAAGCTCTACCACCCCAACAATGCCCAACTGGGCATGGT 
CGTGATGCGGGCAGGGCTGACCAACTGGCATGCTGGTAACATTGAGGTGGGGCACGGGAT 
GATCTGCAAAGCCTATGCCATTCTCCTGGTGACACACGGACCCTCCCACCCCATCACTAA 
GGACTTAGAGGCCATGCGGGTGCAGACGGAGATGGAGCTACGCATGTTCCGCCAGAACGA 
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ATTCATGTACTACAAGATGCGCGAGGCTGCCCTGAACAACCAGCCCATGCAGGTCATGGC 
CGAGCCCAGCAATGAGCCATCCCCAGCTCTGTTCCACAAGAAGCAATGAGGACTG 



In a search of public sequence databases, the NOV 62b nucleic acid sequence, located 
on chromsome 2 has has 1 356 of 1 525 bases (88%) identical to a gb:GENBANK- 
ID:MMU76373|acc:U76373.2 mRNA from Mus musculus (Mus musculus skm-BOPl (Bop) 
5 mRNA, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV62b polypeptide (SEQ ID NO: 148) encoded by SEQ ID NO: 147 
has 490 amino acid residues and is presented in Table 62D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV62b has no signal peptide 
10 and is likely to be localized in the cytoplasm with a certainty of 0.6500. 



Table 62D. Encoded NOV62b protein sequence (SEQ ID NO:148). 



MTIGRMENVEVFTAEGKGRGLKATKEFWAADI I FADRAYS AVVFDS LVNFVCHTC FKRQE 
KLHRCGQCKFAHYCDRTCQKDAWLNHKNECSAIKRYGKVPNENIRLAARIMWRVEREGTG 
LTEGCLVSVDDLQNHVEHFGEEEQKDLRVDVDTFLQYWPPQSQQFSMQYISHIFGVINCN 
GFTLSDQRGLQAVGVGI FPNLGLVNHDCWPNCTVI FNNGNHEAVKSMFHTQMRIELRALG 
KISEGEELTVSYIDFLNVSEERKRQLKKQYYFDCTCEHCQKKLKDDLFLGVKDNPKPSQE 
WKEMIQFSKDTLEKIDKARSEGLYHEWKLCRECLEKQEPVFADTNIYMLRMLSIVSEV 
LS YLQAFEEAS FYARRMVDGYMKLYHPNNAQLGIVr\A/MRAGLTNWHAGNI EVGHGMICKAY 
AILLVTHGPSHPITKDLEAMRVQTEMELRMFR 

PSPALFHKKQ 



A search of sequence databases reveals that the NOV62b amino acid sequence has 457 
of 485 amino acid residues (94%) identical to, and 478 of 485 amino acid residues (98%) 
similar to, the 485 amino acid residue ptnr:SPTREMBL-ACC:P97443 protein from Mus 
1 5 musculus (Mouse) (ZINC-FINGER PROTEIN BOP). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV62b is expressed in at least Whole Organism, Heart, Lung, Prostate, Skeletal and 
Muscle. This information was derived by determining the tissue sources of the sequences that 
were included in the invention including but not limited to SeqCalling sources, Public EST 
20 sources, Literature sources, and/or RACE sources. 

The disclosed NOV62b polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 62E . 
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Table 62E. BLAST results for NOV62b 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


qi | 5 870 83 2 |gb| AAC53 
021.2) (U76373) 


skm-BOPl [Mus 
musculus] 


485 


458/485 
(94%) 


478/485 
(98%) 


0.0 


qi | 10257425 | ref | NP 
033892. 1| 
(NM_009762) 


CD8beta opposite 
strand [Mus 
musculus] 


472 


444/485 
(91%) 


465/485 
(95%) 


0.0 


qi | 18 093 22 |qb| AAC53 
020. 1| (U76371) 


t-BOP [Mus 
musculus] 


456 


419/447 
(93%) 


437/447 
(97%) 


0 . 0 


qi | 16 9 3 03 87 jqb| AAL3 
1880 . 1 |AF410781 1 
(AF410781) 


cardiac and 
skeletal muscle- 
specific BOP1 
[Gallus gallus] 


486 


397/486 
(81%) 


441/486 
(90%) 


0.0 


qi | 16 93 03 89 | gb | AAL3 
1881 . 1 | AF410782 1 
(AF410782) 


cardiac and 
skeletal muscle- 
specific BOP2 
[Gallus gallus] 


473 


384/486 
(79%) 


428/486 
(88%) 


0.0 



Table 62F-G lists the domain descriptions from DOMAIN analysis results against 
NOV62b. This indicates that the NOV62b sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 62F. Domain Analysis of NOV62b 

gnl|Pfam|pfam01753 , zf-MYND, MYND finger. 
CD-Length = 38 residues, 100.0% aligned 

Score = 57.8 bits (138), Expect = le-09 



Table 62G. Domain Analysis of NOV62b 

gnl|Smartjsmart00317 , SET, SET (Su(var)3-9, Enhancer-of-zeste 5 
Trithorax) domain; Putative methyl transferase, based on outlier plant 
homologues 

CD-Length = 125 residues, 44.0% aligned 

Score = 53.9 bits (128), Expect = 2e-08 



Transcriptional regulatory proteins containing tandemly repeated zinc finger domains 
are thought to be involved in both normal and abnormal cellular proliferation and 
differentiation. One abundant class of such transcriptional regulators resembles the Drosophila 
Kruppel segmentation gene product due to the presence of repeated Cys2-His2 (C2H2) zinc 
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finger domains that are connected by conserved sequences, called H/C links. See ZNF91 
(603971) for general information on zinc finger proteins. 

By screening a human insulinoma cDNA library with a degenerate oligonucleotide 
corresponding to the H/C linker sequence, Tommerup et al. (1993) isolated cDNAs potentially 
5 encoding zinc finger proteins. Tommerup and Vissing (1995) performed sequence analysis on 
a number of these cDNAs and identified several novel zinc finger protein genes, including 
ZNF36, which they called ZNF139. The ZNF139 cDNA predicts a protein belonging to the 
Kruppel family of zinc finger proteins. 

By isotopic in situ hybridization, Rousseau-Merck et al. (1995) mapped the ZNF36 

10 gene, which they called KOX18, to 7q21-q22. From pulsed field gel electrophoresis studies, 
they showed that KOX18 is within less than 250 kb of KOX25 (ZNF38; 601261). Rousseau- 
Merck et al. (1995) tabulated 18 different KOX genes that had been located in pairs within 9 
DNA fragments of 200 to 580 kb on 7 different chromosomes. By FISH, Tommerup and 
Vissing (1995) mapped the ZNF36 gene to 7q21.3-q22.1. 

1 5 The disclosed NOV62 nucleic acid of the invention encoding a zinc finger BOP-like 

protein includes the nucleic acid whose sequence is provided in Table 62 A or 62C or a 
fragment thereof. The invention also includes a mutant or variant nucleic acid any of whose 
bases may be changed from the corresponding base shown in Table 62 A or 62C while still 
encoding a protein that maintains its zinc finger BOP-like activities and physiological 

20 functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids 
whose sequences are complementary to those just described, including nucleic acid fragments 
that are complementary to any of the nucleic acids just described. The invention additionally 
includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures 
include chemical modifications. Such modifications include, by way of nonlimiting example, 

25 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 12 percent of the bases may be so changed. 

30 The disclosed NOV62 protein of the invention includes the zinc finger BOP-like 

protein whose sequence is provided in Table 62B or 62D. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 62B or 62D while still encoding a protein that maintains its zinc finger 
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BOP-like activities and physiological functions, or a functional fragment thereof. In the 
mutant or variant protein, up to about 6 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(FabK ^t bind immunospecifically to any of the proteins of the invention. 
5 The above defined information for this invention suggests that this zinc finger BOP- 

like protein (NOV62) may function as a member of a "zinc finger BOP family". Therefore, the 
NOV62 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 

10 protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV62 nucleic acids and proteins of the invention are useful in potential 

1 5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the zinc finger BOP-like 
protein (NOV62) may be useful in gene therapy, and the zinc finger BOP-like protein 
(NOV62) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 

20 treatment of patients suffering from Cardiomyopathy, Atherosclerosis, Hypertension, 

Congenital heart defects, Aortic stenosis , Atrial septal defect (ASD), Atrioventricular (A-V) 
canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic stenosis, Ventricular septal 
defect (VSD), valve diseases, Tuberous sclerosis, Scleroderma, Obesity, Transplantation, 
Aneurysm, Fibromuscular dysplasia, Stroke, Anemia , Bleeding disorders, 

25 Adrenoleukodystrophy , Congenital Adrenal Hyperplasia, Diabetes, Von Hippel-Lindau 
(VHL) syndrome , Pancreatitis, Hyperparathyroidism, Hypoparathyroidism, SIDS, 
Endometriosis, Fertility, Xerostomia, Hypercalceimia, Ulcers, Cirrhosis, Inflammatory bowel 
disease, Diverticular disease, Hirschsprung's disease , Crohn's Disease, Appendicitis, 
Hemophilia, hypercoagulation, Idiopathic thrombocytopenic purpura, autoimmume disease, 

30 allergies, immunodeficiencies, Graft vesus host, Ataxia-telangiectasia, Hemophilia, 
Lymphedema, Tonsilitis, Osteoporosis, Arthritis, Ankylosing spondylitis, Scoliosis, 
Tendinitis, Muscular dystrophy, Lesch-Nyhan syndrome, Myasthenia gravis, Dental disease 
and infection, Alzheimer's disease, Tuberous sclerosis, Parkinson's disease, Huntington's 
disease, Cerebral palsy, Epilepsy, Lesch-Nyhan syndrome, Multiple sclerosis, Ataxia- 
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telangiectasia, Leukodystrophies, Behavioral disorders, Addiction, Anxiety, Pain, 
Neuroprotection, Growth and reproductive disorders, Endocrine dysfunctions, Systemic lupus 
erythematosus , Asthma, Emphysema, ARDS, Pharyngitis, Laryngitis, Hearing loss, Tinnitus, 
Psoriasis, Actinic keratosis ,Tuberous sclerosis, Acne, Hair growth, allopecia, pigmentation 
5 disorders, cystitis, incontinence, Renal artery stenosis, Interstitial nephritis, 

Glomerulonephritis, Polycystic kidney disease, Systemic lupus erythematosus, Renal tubular 
acidosis, IgA nephropathy, Vesicoureteral reflux, glaucoma, blindness, and Hypothyroidism, 
or other pathologies or conditions. The NOV62 nucleic acid encoding the zinc finger BOP-like 
protein of the invention, or fragments thereof, may further be useful in diagnostic applications, 

10 wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV 62 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV62 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

1 5 NOVX Antibodies" section below. The disclosed NOV62 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



20 NOV63 

A disclosed NOV63 nucleic acid of 3647 nucleotides (also referred to as CG57777-01) 
encoding a secreted protein-like protein is shown in Table 63 A. The start and stop codons are 
in bold letters. 



Table 63A. NOV63 nucleotide sequence (SEQ ID NO:149). 

GACCTACAGAGAGACTTAGACTCCTACACAATAGTAGTGGGAGACTTTAACACCCCACTGTCAACATTAG 
ACAGATTGAGACAGAAAATTAACAAGGATATT CAGTACTTGAACTCAGCT CTGGACCAAG CAGACCTAAT 
AGACATCTACAGAACTCTCCACTCCAAATCAACAGAATATGCATTTTTCTCAGCACTACATCACACTTAT 
T CTAAAATTCAC CACATAATTGGAAGTAAAACACTCCTTAGCAAATGCAAAAGAATGGAAAT CATAACAA 
ACAGTCTCTCAGACCACAGTGCAATCAAATTAGAACTCAGGATTAAGAAACTCACTCAAAACCACACAAC 
TACATGGAAACTGAAAAACCTGCTCCTGAATAACTACTTGGTAAATAATGAAATTAAGGCAGAAATAAAT 
AAGTTCTGTGAAACCAATGAGAACAAAGACACAACGTACCAGAATTTCTGGGACACAGCTAAAGCAGTGG 
TT AG AGGGAAATT T AT AG C ACT AAATG CG C AC AGG AG AAAGC AAGAAAG ATGT AAAAT C AACACCCT AAC 
AT CACAAT TAAAAGAACTAGAGAAG CAAGAG CAAACAAAT T CAAAAG C TAACAGAAGACAAG AAATAACT 
AAGATCATAGCAGAACTGAAGGAGATAAAGACACGAAAAACCC^TCAAAAAATCAATGAATCTGGGAGCT 
GGTTTTTTGAAAAGATTAACAAAATAGATAGACAACTAGCCAGACTAATAAAGAAGAGAAGAGAGAAGAA 
TCAAATAG ATG CAATAAAAAATGATAAAGGGG ATAT CACTG CTGAT C C CACAG AAATACAAACTAC CATC 
AGAGAATACTATAAACACCTCTATGCAAATAAACTAGAAAATCTAGAAGAAATGGATAAATTCCTGGCCA 
CATG CACC CT CC CAAGACT AAAC CAGGAAGAG TTAG AAT C C CTGAATAGACAAATAACAAGTT CTG AAAT 
TAAGG CAGTAAT TAATAG CC TACCAAC CAAACAAAAG C C CGGACCAGATGGATTCACAG C TG AATTCTAC 
CAGAGG T ACAAAGAGG AG CTGG TAC CAT T CCTT CTG AAAC TAT T C CAAACAA CAG AAAAAGAGGG ACT CC 
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TT C CT AACTCAT TT T ATGAGGC AAG CAT C ATGCTGATG C CAAAAT CTGG CAGAGAC AC AAC AAAAAAAG A 
AAATTTCAGGCCTATATCCCTGATGAACATCGATGTGAAAATCCTCAATAAAATACTGGCAAACCAAATC 
TTGCAGCACATC^WWVGCTTATCCACGATGATCAAGTTGGCTTCATCCCTGGGATGCAAGGCTGGTTCA 
ACATATG CAAAT CAAT CAACATAAT C CAT CACATAAATAGCACCAATGACAAAAAC CACATGATTAT CTC 
AATAGATGCAGAAAAGGCCTTTGGTAAAATTCAACACCCCTTCATGCTAAAAACTCTAAATAAGCTAGGT 
AT TG ATGGAACGTAT CTCAAAATAATAAGAGCTGT T TATGACAAAC C CACAG C CAATAT CATAC TGACTG 
GG CAAAAGCTGGAAGCATTCCCTTTGAAAACCAGCACAAGACAAGTATGCCCTCTCTCACCACTCCTATT 
CAACATGGTAT TGGAAGTTCTGG C TAGGG CAAT CAGG CAAGAG AAAGAAATAAAGCATAT C CAAAT AGGA 
AGAGAGGAAGTCAAATTGTCCCTGTTTGCAGATGACATGATTGTATATTTAGAAAACCCCATCGTCTCAG 
CCCAAAATCTCCTTAAGCTGATAAGAAACTTCAGCAAAGTCTCGGGATAGAAAATCAATGTGCAAAAATC 
AC AAG CATT C C TAT ACAT CAATAATAGAC AAAC AGAGAGCCAAAT CGTG AGTGAACT CC C ATTCACAATT 
GTTACAAAGAG AATACAATAC CT AGGAAT AC AACT T AC AAGGGATGTGAAGGACCT CT T CAAGGAGAACT 
A CAAACCACTG CTCAAGGAAATAAGAGAGG ACACAAACAAATGGAAAAACATT CTATG CT C ATGGATAGG 
AAG AAT CAAT ATCG TGAAAATGAC CATG CTG C C CAAAGTAAT T TATAGATT CAACA CTATG CCCAT CAAG 
CTACCATTGACTTTCTTCACGGAATCAGACAAAACTACTTTAAATTTCATATGGAACCAAAAAAGAGCCT 
G CACAG CCAAGACAAT C CTAAG CAAAAAGAACAAAG CTGGAGG CAT CACACTACCTAAC TTCAAACTATA 
C TACAAGG CTACAGTGAC CAAAACAG CATGGTACTGG TAC CAAAACAGATATACAGAC CAATGGAACAGA 
ATAGAGGCCTCAGAAATAACACC^CACATCTACAACCACCTGATCTTTGACAAACC 

AATGGGGAAAAGGATCTCTATTTAATAAATGGTGTTGGGAAAACTGGCTAG CCATATG CAGAAAACTGAA 
ACTGGACCCCTTCCTTACACTTTATACAAAAATTAATTCAAGCTGGATTAAAGACTTAAATGTAAGACCT 
AAAAC AATAAAAAT C CT AG AAGAAAAC CTGGGCAAT AC CATTCAGG ACAT AGGCATGGG CAAAG ACT T CG 
TGACTGTAAC AC GAAAAG CAATGG C AAC AAAAGC C AAAATTGACAAATGGG ATCTAATT AAACT AAAGAG 
CTTCTGCACAGC^AAAGAAACTGTCATCAGGGTGAACAGGCAACCTAC^GAATGGGAAAAATTTTTTGCA 
ATCTGTCC^TCTGACAAAGGGCTAATATCCAGAATCTACAAGGAACTTAAACAAATTTACAAGAAAAAAA 
CAAACAACCCTAT CAAAAAGTGGG CAAAGG CTATGAACAGACAC T T CT CAAAAGAAG ACATTTATG CAG C 
CAAAAG ACATATGAAAAAATGG T CAT CAT CACTGGT CT TCAGGGAAATGCAAAT CAAAACCACAATG AGA 
TACCATCTCATGCCAGTTAGAATGGTGATCATTAGAAAGTC^GGAAACAACACATGCATGCAAATCAAAA 
CCACAATG AGATAC CAT CTCATG C CAG T T AGAATGG TG ATCATTAGAAAGT CAGGAAACAAC ACATG CAG 
AGGATGTGGAGAAATAGGAATGCTTTTACACTGTTGGTGGGAGTGTAAACTAGTTCAACCATTGTGGAAG 
AC AG TG TGG CG ATT CCT CAAGGAT C TAG AACCAGAAATACCATTAGAC C CAG CAATC C CATTACTGGG TA 
T AT AC CC AAATG AT T AT AAATC ATG CT AC T AT AAAG AC ACATG CACACG TATGTTT AT TG CGG C ACT AT T 
CAC AAT AGC AAAGACT TGG AACC AAC CCAAATGC C CAT C AGTG AG AG T CATAAAG AAAATG TGG CACAT A 
TAC AT C ATGGAAT ACT ATGCAGC C AT AAAAAAGGATG AGTTC ATGTCCT T TGC AGGGACATGG ATGAAT C 
TGGAAACCACCATTCTCAGCAAACTAACACAGGAACAGAAAACCAAATACCGCTTGTTCTCACTCGTAAG 
TTGGAGTTGAACAATGAGAACACATGGACACAGGGAGGGGAACAACACCAGGGCCTGTCAGGGGGTAGGG 
GGGATAGGGGAGGGATAG CATTAAGAGAAATACCTAATGTAGATGACGGGTTGATGGGTG CAGCAAAC CA 
CCAAGGC 



In a search of public sequence databases, the NO V63 nucleic acid sequence, located on 
chromsome 13 has 3146 of 3647 bases (86%) identical to a gb:GENBANK- 
ID:HSIL25FL|acc:X67285.1 mRNA from Homo sapiens (H.sapiens gene for interleukin-2 (5' 
flanking region)). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV63 polypeptide (SEQ ID NO: 150) encoded by SEQ ID NO: 149 has 
1081 amino acid residues and is presented in Table 63B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV63 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.6000. 



Table 63B. Encoded NOV63 protein sequence (SEQ ID NO: 150). 



ME 1 1 TNS L S DH S A I KLE LR I KKLTQNHTTT WKLKNLLLNNYLVNNE I KAE I NKFCETNEN 
KDTTYQNFWDTAKAWRGKFIALNAHRRKQERCKINTLTSQLKELEKQEQTNSKANRRQE 
ITKIIAELKEIKTRKTHQKINESGSWFFEKINKIDRQLARLIKKRREKNQIDAIKNPKGD 
I TADPTE I QTT I RE YYKHL YANKLENLEEMDKFLATCTLPRLNQEELE S LNRQ I TS S E I K 
AVINSLPTKQKPGPDGFTAEFYQRYKEELVPFLLKLFQTTEKEGLLPNSFYEASIMLMPK 
SGRDTTKKENFRP I SLMNIDVKI LNKI LANQ I LQHI KKL IHDDQVGF I PGMQGWFNI CKS 
INI IHHINSTNDK3SIHMI ISIDAEKAFGKIQHPFMLKTLNKLGIDGTYLKI IRAVYDKPTA 
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NIILTGQKLEAFPLKTSTRQVCPLSPLLFNMVLEVLARAIRQEKEIKHIQIGREEVKLSL 
FADDMIVYLENPIVSAQNLLKLIRNFSKVSGYKINVQKSQAFLYINNRQTESQIVSELPF 
TIVTKRIQYLGIQLTRDVKDLFKENYKPLLKEIREDTNKWKNILCSWIGRINIVKMTMLP 
KVI YRFNTMP I KLPLTFFTESDKTTLNF I WNQKRACTAKT I LS KKNKAGG I TLPNFKL YY 
KATVTKTAWYWYQNR YTDQWNR I E AS E I TPH I YNHL I FDKPDTNKQWGKGS L FNKWC WEN 
WIAICRKLKLDPFLTLYTKINSSWIKDLOTRPKTIKILEENLGNTIQDIGMGKDFVTVTP 
KAMATKAKIDKWDLIKLKSFCTAKETVIRWRQPTEWEKFFAICPSDKGLISRIYKELKQ 
I YKKKTNNP I KKWAKAMNRHFSKEDI YAAKRHMKKWS S SLVFREMQ I KTTMRYHLMPVRM 
VI IRKSGISTNTCMQIKTTMRYHLMPVRMVIIRKSGNNTCRGCGEIGMLLHCWWECKLVQPL 
WKTVWRFLKDLEPEIPLDPAIPLLGIYPNDYKSCYYKDTCTRMFIAALFTIAKTWNQPKC 
PSVRVIKKMWHIYIMEYYAAIKKDEFMSFAGTWMNLETTILSKLTQEQKTKYRLFSLVSW 
S 



A search of sequence databases reveals that the NOV63 amino acid sequence has 752 
of 865 amino acid residues (86%) identical to, and 787 of 865 amino acid residues (90%) 
similar to, the 1010 amino acid residue patp:B38012 Human secreted protein encoded by gene 
3 clone HNHCT15. Public amino acid databases include the GenBank databases, SwissProt, 
PDB and PIR. 

The disclosed NOV63 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 63C. 



Table 63C. BLAST results for NOV63 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%> 


Positives 
(%) 


Expect 


gi | 106322 |pir| |B340 
87 


protein (L1H 3' 
region) - human 


1280 


919/1078 
(85%) 


966/1078 
(89%) 


0.0 


gi | 207294 8 |gb|AAC5l 
261. 1| (U93563) 


putative pl50 
[Homo sapiens] 


1275 


904/1078 
(83%) 


959/1078 
(88%) 


0.0 


qi | 339777 |gb|AAB593 
68. 1| (M80344) 


ORF2 contains a 
reverse 
transcriptase 
domain. [Homo 
sapiens] 


1275 


898/1078 
(83%) 


957/1078 
(88%) 


0.0 


gi|505295l|qb|AAD38 
785 . 1 | AF149422 2 
(AF149422) 


unknown [Homo 
sapiens] 


1275 


900/1078 
(83%) 


957/1078 
(88%) 


0.0 


gi|2136112|pir| | S65 
824 


reverse 

transcriptase ] 
homo log - human 
transposon LI . 1 


1275 


898/1078 
(83%) 


957/1078 
(88%) 


0.0 



Table 63D lists the domain descriptions from DOMAIN analysis results against 
NOV63. This indicates that the NOV63 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 63D. Domain Analysis of NOV63 

gnl | Pfamlpfam00078 , rvt, Reverse transcriptase (RNA- dependent DNA 
polymerase) . A reverse transcriptase gene is usually indicative of a 
mobile element such as a ret rotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including 
retrotransposons, retroviruses, group II introns, bacterial msDNAs, 
hepadnaviruses, and caulimoviruses . 

CD-Length = 208 residues, 97.6% aligned 

Score = 99.0 bits (245), Expect = le-21 

Secreted proteins can act as cytokines, growth factors, chemotactic factors, and ligands 
for cell surface receptors. Secreted functions play vital roles in the regulation of cell motility, 
proliferation, differentiation and apoptosis. 

5 

The disclosed NOV63 nucleic acid of the invention encoding a secreted protein-like 
protein includes the nucleic acid whose sequence is provided in Table 63 A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 63A while still encoding a protein 

10 that maintains its secreted protein-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

1 5 chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 

20 acids, and their complements, up to about 14 percent of the bases may be so changed. 

The disclosed NOV63 protein of the invention includes the secreted protein-like 
protein whose sequence is provided in Table 63 B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 63B while still encoding a protein that maintains its secreted protein-like activities 

25 and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 14 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a t>)2, that bind immunospecifically to any of the proteins of the invention. 
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The above defined information for this invention suggests that this secreted protein- 
like protein (NOV63) may function as a member of a "secreted protein family". Therefore, the 
NOV63 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV63 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the secreted protein-like 
protein (NOV63) may be useful in gene therapy, and the secreted protein-like protein 
(NOV63) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from CNS disorders, brain disorders including epilepsy, eating 
disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune disorders 
including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin 
disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; thalamus disorders; 
metabolic disorders including diabetes and obesity; lung diseases such as asthma, emphysema, 
cystic fibrosis, pancreatic disorders including pancreatic insufficiency and cancer; and 
prostate disorders including prostate cancer, or other pathologies or conditions. The NOV63 
nucleic acid encoding the secreted protein-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV63 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV63 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV63 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV64 

A disclosed NOV64 nucleic acid of 3081 nucleotides (also referred to as CG57779- 
01) encoding a secreted protein-like protein is shown in Table 64 A. The start and stop codons 
are in bold letters. 



Table 64A. NOV64 nucleotide sequence (SEQ ID NO:151). 

AAATGTAAAAGAACTGTCTCTCA^ 

AAAACTGCCCAACTACATGGAAACTGAACAACCTGCTCCTGAATGACTACTGGGTACATAACAAAATCAA 
GG C AGAAAT AAAGATGTT C TTTGAAAC CAACAAGAAC AAAGAC ACAAC AT AC CAG AAT CT CTGGG AC ACA 
TTCAAAGCAGTGTGTAGAGGGAAATTTATAGCACTAAATACCCATAAGAGAAAGCAGGAAAGATCTAAAA 
TTG AC AC C CT AACAT CAC AATTAAAACAACT AC AG AAG CAAG AG CAAAC AC AT TC AAAAG CTAG CAG AAG 
G CAAG AAATAACT AAG AT CAG AG CAG AAC TG AAGG AG ATAG AG ACACAAAAAAATG C T T CAAAAAAAAAT 
GAACTTAAAAAAGATAAAGGGGTTTGGTCCACCGATCCCAGAGAAAAACACACTACCATCAGAGAATACT 
ATAAACAC CTG TATGCAAATAAACTAGAAAATCTGGAAG AAATGGATAAAT TT CTGGACAAATACACCTT 
C C CAAGACTAAAC CAGG AAGAAG TTGAAT C C CTGAATAGACCAATAACAGG CT CGG AAATTGAGG CAATA 
ATTAATAGCTTACC^CC^AAAAAAGTCCAGGGTCAGATGGATTCACAGCCGAATTCTACCAGAGGTACA 
AGGAGGAG CTGGTAC CATT CCTT CTG AAACTATT C CAAT CAATAGAAAAAGAGGG AAT C CT CC CTAACT C 
ATT TG ATGAGGCCAGCAT CAT CCTGAT AC CAAAG C CTAGCAGAG AC ACAACAAAAAAAGAGAATT TT AG A 
CCAATATCCCTGATGAACATCGATGCAAAAATCCTCAATAAAATACTGG CAAAACGAATCCAGCAG CACA 
TCAAAAAGTTTATCmCCACGATCAAGTGGGCTTCATCCCTAGGATGCAAGGCTGGTTTAACATATGCAA 
ATCAATAAACG TAAT C CAG CATATAAATAGAAC CAAAGACAAAAACCACATGAT TATC T CAATAGATG C A 
GAAAAGGCCTTTGACAAAATTCAACAGCCCTTCATGCTAAAAACTCTCAGTAAATTAGGTATTGATATGA 
CATATCTCAAAATAATAAGAG CTAT C TATGACAAAC C CACAGC CAATATCATAC TGAATGGG CAAAAACT 
GGAAGCATTCCCTTTGAAAACTGGCACAAGACATGGGTGCCCTCTCTCACCACTCCTATTCAACATAGTG 
TTGGAAGT C CTGG C CAGGG CAAT CAGG CAGGAGAAGGAAATAAAGGGTATT CAATTAGG AAAAG AGG AAG 
T C AAAT TGT C C CTGTT TG C AGAT G ACAT G AT TTT AT AT C T AGAAAA C C C CAT CGT C T CAG C C C AAAAT CT 
CCTTAAGCTGATAAGCAACTTCAGCAAAGTCCCAGGATACAAAATCAATGTGCAAAAATCACAAGCATTC 
TT AT AC AC CAAT AACAG ACAG AC AG AG AG CCAAATCATG AG TG AAC TC CCATTT AC AATTG CTTC AAAG A 
GAATAAAATACCTAGGAATC CAACTTACAAGGGATGTGAAGGACTCTTCAAGGAGAACTACAAACCCATG 
CT CAAT TGAAATAAAAG AGGATACAAACAAATGGAAGAACATT CCATG CT CATGGGT AAGAAGAAT CAAT 
ATTGTGAAAATGGCCATTCTGCCCAAGGTAATTTATAGGTTCAATGCCATCCCCATCAAGCTACCAATGG 
CTTTCTTCACAGAATTGGAAAAAACTACTTTAAAGTTCATATGGAACCAAAAAAGAGCCTGCATTGCTAA 
GCCTGCATTGCTAAGCCAAAAGAACAAAGCTGGAGGCATCATGCTACCTGACTTCAAACTATACTACAAG 
G CCACAGTAAC CAAAACAG CATGGT ACTGGT AC CAAAACAG ATATATAGAC CAATGGAACAAAGCAGAG C 
CCTCAGAAATAATGCCACACATCTATAACTATCTGATCTTTGACAAACCTGACAAAAACAAGAAATCGGG 
AAAGGATTCCGTATTTAATAAACGGTCCTGGGAAAACTGGCTAGCCATATGTAGAAAGCTGAAACTGGAC 
CCCTT C CT TACAC CT CATACAAAAAT TAATT CAAGATGGATTAAAGAC TTAAATG TTAGAC C TAAAAC CA 
TAAAAAC C CTAGAAG AAAAC CTAGG CAATAC CATTCAGG ACATAGGCATGGG CAAGGACT TCATG TC TAA 
AACACCAAAAGCAATGGCAACAAAAGACAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGC 
AC AGCAATAGAAACTAC CAT CAGAG TGAACAGG CAAC C TACAGAATGGGAGAAAATTT TTGCAAC CTACT 
CAT CTGACAAAGGGC T AATAT C CAG AAT CCACAATG AACT CAAACAAATT TACAAG AAAAAAT CAAACAA 
CCCCATCAAAAAGTGGG CAAAGGATATGAACAGACACTTCTCAAAAGAAGACATTTATGCAG CCAAAAGA 
CACATGAAAAAATGCT CAT CAT CACTGG C CAT CAGAGAAATG CAAATGAAAAC CAC AATGAGATAC CAT C 
T CAC ACCAGT T AGAATGGCG AT CAT TAAAAAGTC AGGAAACAAC AGGTGCTGG AG AGG ATGTGGAGAAAT 
AGGAACACTTTTACGCTGTTGGTGGGACTGTAAACTAGTTCAACCATTGTGGAAGTCAGTGTGGCGATTC 
CTCAGGGATCTAGAACTAGAAATACCATTTGACCCAGCCATCCCATTACTGGGTATATACCCAAAGGACT 
ATAAAT CATG CTG CTAT AAAGACACATG CAG C CG TATGT T TG TTG CAG CACTAT T CAC AACAG CAAAGAC 
TTGGAAC CAA.CC CAAATGTC CAACAATG ATAGAC TGGATTAAGAAAATG TGG CACATATACACCATGGAA 
TACTATG CAG CCACAAAAAAAAAGGATGAGT T CATGT C CT TTGCAGGGACATGGATGAAG CTGGAAACCA 
TCATTCTCAGCAAACTATCACAAGGACAGAAAACCAAACACTGCATGTTCTCACTCATAGGTGGGAATTA 
G 



In a search of public sequence databases, the NOV64 nucleic acid sequence, located on 
chromsome 13 has 2741 of 3050 bases (89%) identical to a gb:GENBANK- 
ID:HSU157D4|acc:Z68871.1 mRNA from Homo sapiens (Human DNA sequence from 
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cosmid U157D4, between markers DXS366 and DXS87 on chromosome X). Public 
nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV64 polypeptide (SEQ ID NO: 152) encoded by SEQ ID NO:151 has 
1017 amino acid residues and is presented in Table 64B using the one-letter amino acid code. 
5 Signal P ? Psort and/or Hydropathy results predict that NOV64 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.3906. 

Table 64B. Encoded NOV64 protein sequence (SEQ ID NO:152). 

rWQSKLELSIKKLTQNCPTTWKLNN^ 

TFKAVCRGKF I ALNTHKRKQERS KI DTLT S QLKQLQKQE QTHS KAS RRQE I TKI RAELKE 
IETQKNASKKNELKKDKGWSTDPREKHTTIREYYKHLYANKLENLEEMDKFLDKYTFPR 
LNQEEVESLNRPITGSEIEAIINSLPTKKSPGSDGFTAEFYQRYKEELVPFLLKLFQSIE 
KEG I LPNS FDE AS 1 1 L I PKPS RDTTKKENFRP I S LMN I DAKI LNK I LAKR I QQH I KKF IH 
HDQVGFIPRMQGWFNICKSINVIQHINRTKDKNHMIISIDAEKAFDKIQQPFMLKTLSKL 
GIDMTYLKIIRAIYDKPTANIILNGQKLEAFPLKTGTRHGCPLSPLLFNIVLEVLARAIR 
QEKEIKGIQLGKEEVKLSLFADDMILYLENPIVSAQNLLKLISNFSKVPGYKINVQKSQA 
FLYTNNRQTESQIMSELPFTIASKRIKYLGIQLTRDVKDSSRRTTNPCSIEIKEDTNKWK 
NIPCSWVRRINIVKMAILPKVIYRFNAIPIKLPMAFFTELEKTTLKFIWNQKRACIAKPA 
LLS QKNKAGG I MLPDFKL YYKATVTKTAWYWYQNR Y I DQ WNKAE P S E I MPH I YNYL I FDK 
PDKNKKS GKDS VFNKRS WENWIiAI CRKLKLDPFLTPHTKINSRWI KDLNVRPKT I KTLEE 
NLGNTIQDIGMGKDFMSKTPKAMATKDKIDKWDLIKLKSFCTAIETTIRVNRQPTEWEKI 
FAT YS S D KGL I S R I HNE LKQ I YKKKS NNP I KKWAKDMNRHF S KE D I YAAKRHMKKC S S S L 
AIREMQMKTTMRYHLTPVRMAI IKKSGNNRCWRGCGE IGTLLRCWWDCKLVQPLWKS VWR 
FLRDLELEIPFDPAIPLLGIYPKDYKSCCYKDTCSRMFVAALFTTAKTWNQPKCPTMIDW 
I KKMWH I YTME YYAATKKKDEFMS FAGTWMKLET 1 1 LS KLSQGQKTKHCMFS L I GGN 



A search of sequence databases reveals that the NOV64 amino acid sequence has 829 
of 977 amino acid residues (84%) identical to, and 864 of 977 amino acid residues (88%) 
10 similar to, the 1010 amino acid residue patp:B38012 Human secreted protein encoded by gene 
3 clone HNHCT15. Public amino acid databases include the GenBank databases, SwissProt, 
PDB and PIR. 

The disclosed NOV64 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 64C. 

15 



Table 64C. BLAST results for NOV64 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l06322|pir| |B340 
£7 


protein (L1H 3 1 
region) - human 


1280 


929/1044 
(88%) 


956/1044 
(90%) 


0.0 


qi| 5052951 |qb|AAD3 8 
785.1 |AF149422 2 
(AF149422) 


unknown [Homo 
sapiens] 


1275 


920/1044 
(88%) 


949/1044 
(90%) 


0.0 


gi|2072948|gb|AAC51 
261. 1| (U93563) 


putative pl50 
[Homo sapiens] 


1275 


919/1044 
(88%) 


946/1044 
(90%) 


0.0 


gi|2072958|gb|AAC51 
267. 1| (U93567 


pi 50 [Homo 
sapiens] 


1275 


918/1044 
(87%) 


947/1044 
(89%) , 


0.0 
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gi | 5070622 | gb | AAD3 9 


unknown [Homo 


1275 


918/1044 


949/1044 


0.0 


215.1|AF148856 2 
(AF148856) 


sapiens] 




(87%) 


(89%) , 





Table 64D lists the domain descriptions from DOMAIN analysis results against 
NOV64. This indicates that the NOV64 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 64D. Domain Analysis of NOV64 

gnl | Pf am [ pf a m00078 , rvt, Reverse transcriptase (RNA- dependent DNA 
polymerase) . A reverse transcriptase gene is usually indicative of a 
mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including 
retrotransposons, retroviruses, group II introns, bacterial msDNAs, 
hepadnaviruses, and caulimoviruses . 

CD-Length = 208 residues, 97.6% aligned 

Score = 109 bits (272) , Expect = 9e~25 



The disclosed NOV64 nucleic acid of the invention encoding a secreted protein-like 
protein includes the nucleic acid whose sequence is provided in Table 64A or a fragment 
thereof The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 64A while still encoding a protein 
that maintains its secreted protein-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 1 percent of the bases may be so changed. 

The disclosed NOV64 protein of the invention includes the secreted protein-like 

protein whose sequence is provided in Table 64B. The invention also includes a mutant or 

variant protein any of whose residues may be changed from the corresponding residue shown 

in Table 64B while still encoding a protein that maintains its secreted protein-like activities 

and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 

up to about 16 percent of the residues may be so changed. 
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The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this secreted protein- 
like protein (NOV64) may function as a member of a "secreted protein family". Therefore, the 
5 NOV64 nucleic acids and proteins identified here may be useful in potential therapeutic 

applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

10 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV64 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the secreted protein-like 

1 5 protein (NOV64) may be useful in gene therapy, and the secreted protein-like protein 
(NOV64) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from CNS disorders, brain disorders including epilepsy, eating 
disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune disorders 

20 including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin 
disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; thalamus disorders; 
metabolic disorders including diabetes and obesity; lung diseases such as asthma, emphysema, 
cystic fibrosis, pancreatic disorders including pancreatic insufficiency and cancer; and 
prostate disorders including prostate cancer, or other pathologies or conditions. The NOV64 

25 nucleic acid encoding the secreted protein-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV 64 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV64 substances for use in 

30 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV64 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
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understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV65 

A disclosed NOV65 nucleic acid of 3021 nucleotides (also referred to as CG57781-01) 
encoding a secreted protein-like protein is shown in Table 65 A. The start and stop codons are 
in bold letters. 



Table 65 A. NOV65 nucleotide sequence (SEQ ID NO; 153). 



AATGACTACTGAGTAAATAATGAAATGAAGGCAGAAATAAAGATGTTCTTTGAAACCAATGAGAACAAAG 
AC AC AATG TAC CAG AATCT C TGGG ACAC ATT T AAAG C AGTGTG T AG AGGG AAATT T AT AG C ACT AAATGC 
CCACAAGAGAAAG CAGGAAAGAT C TAAAAT CAACAT CCTAACATCACAGT TAAAAGAACTAGGGAAGCAA 
GAACAAACAAATT CAAAAG C TAG CAGAAGG CAAAAAATAACTAAGAT CAGAG CAGAACTGAAGG AG ATAG 
AGACACAAAAAAC CCTTCAAAAAAT CAATGAAT C CAGGAG CTGGT TTTTTGAAAAGAT CAACAAAATTGA 
TAGACAAC TAG CAAGACCAATAAAG AAGAAAAGAGAGAAGAAT CAAATAG ATG CAACAAAAAATGATAAA 
GGGG ATAT CAC CACTGAT C C CACAGAAATACAAACTAC CAT CAGAGAATACTATCAACACTT C TATGCAA 
AT ATACTAGAAAATCTAGAAGAAATGGAT AAATT C CTGGACACATACACT CT C CCAAGACTAAAC CAGG A 
AGAAGTTGAAT CTCTGTATAGACCAATAACAGGTT CTGAAATTGAGGCAATAATTAATAGGCTAC CAACC 
AAAAAAAG T C CAGG AC C AG ATGG ATTCACAG CTG AATT CT AC C AG AGG T ACAAAG AGG AG CTGGT AC CAT 
TCCTTCTGAAACTATTTCAGACAACAGAAAAAGAGGGACTCCTCCCTAACTCATTTTATGAGGCCAGCAT 
CAT CCTGACAC CAAAACC TGG TAGAG ACACAACAAAAAAAGAGAATTTTATG CCAATATC CCTG ATGAAC 
ATTGATGCGAAAGTCCTCAATAAAATACTGGCAAAAGCTTATCCACCACATCAAAAGCTTATCCACCACG 
GTCAAC TTGG CTT CAT CC CTGGGATG CAAGG CTGGT TCAACATATG CAAATCAATAAATG TAG TT CAT CA 
CATAAACAG AACCAATGACAAAAAC CACATGATT AT C T CAAT AGATG CAGAAAAGG CC T TCGACAATATT 
CAACACCACTTCATGCTAAAAACTCTGAGTAAACTAGGTATCGATGGAACATATCTGAAAATAATAAGAG 
CTATTTATGACAAAC C CAC AG C CAAT AT CATAGTGAATGGG CAAAAACTGG AAGCATT C C CTTTG AAAAC 
TGGCACAAGACAAGGATGCCCTCTCTCACCACTCCTATTCAACATAGTGTTGGAAGTTCTGGCTAGGGCA 
ATCAGG C AAG AG AAAG AAAT AAACGG T ATT CAATT AGG AAAAG AGG AAG TCAAAT TG TCT CTG TG TG CAG 
ATGACATGATTGTAT ATT TAG AAAAC C C CATCG TC T CAGC CCAAAAT C T C CT TAAG C TG ATAAG CAACTT 
CAG CAAAG TCT CAGGATACAAAATCAATGTG CAAAAAT CACAAG CAT T C C TATACAT CAATAATAG ACAA 
ACAGAGAGCCAAATCATGAGTGAACTCCCATTCCCAATTACCACAAAGAGAATTAAATACCTAGGAATCC 
AAC TTACAAGGGATGTGAAAG AC CT CT T CAAGGAGAACTACAAAC CAC TG CT CGAAATAAAAG AGGACAC 
AAACAAATGGAAAAACAT T C CATG CT C ATGGATAGG AAGAAT CAATAT TG TGAAAATGG T CATACTG CCC 
AAAGTAATTTATAG ATT CAATG CTATC C CCATCAAG CTAC CACTG ACTTT CTTCACAGAATTGGAAAAAA 
CT ATT T T AAAGTT C AT ATGGAAC CAAAAAAGAAC C C AGATTG C CAAGACAAT C C T AAGC AAAAAGAAC AA 
AGCTGGAGGCATCACACTACCTGACTTCAAACTATACTACAAGGCTACAGTAAACAAAACAGCATGGTAC 
TGG TAC C AAAACAG AT ATAT AG AC CAATGG AACAGAATGG AGG C C TC AG AAAT AACAC CAC AC AT CTAC A 
ACCATCTGATCTTTGACAAACCTGACAAAAACAGGCAATGGGGAAAGGATTCTCTATTTAATAAATGGTG 
CTGGGAAAACTGGCTAGCCATATGTAGAAAGCTGAAACTGGACCCCTTCCTTACACCTTATACAAAAATT 
AACACAAGATGGATTAAAG ACT TAAACG TCAGACCTAATACCATAAAAAC C C TAGAAGAAAAC CTAGG CA 
ATAC CAT T CAGGACATAGG CATGGG CAAAG TCTT CATGACTAAAACACCAAAAG CAATGG CAACAAAAG T 
CAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTATCATCAGAGTG 
AACAGG CAAC CTACAGAATGGGAGAAAAT CTTTG CAAC CTAC C CATCTGACAAAGGG CTAATAT C CAGAA 
T C T ACAAAGAAC T C AAACAAATT T AC AAGAAAAAAAAAACAAC CC CAT C AAAAAGTGGGCAAAT ACAAGA 
AAAAAAAAACAACCCCAT CAAAAAGTGGG CAAAGGATATG AGCAGACACTTCTCAAAAGAAGACATTTAT 
GCAG C CAACAGAATGAAAAAGTGG TCATCATCACTGG TC TTCAGGGAAATG CAAAT CAAAA C CACAATGA 
GATAC CAT CT CATG C CAG TTAGAATGG TGATCATTAAAAAG T CAGG AAAC AACACATG C CTG AGAGGATG 
TGGAGAAATAGGAATG C TTT TACAC TGTTGGTGGGAGTGTAAACTAG TT CAACCAT TGTGGAAGACAGTG 
TGG CGATT C C T CAAGG AT CTAGAAC CAG AAATAC CATTAG AC CCAG CAATC C CAT TACTGGGTATATAC C 
CAAACAATTATAAATCATGCTACTATAAAGACACACGCACACGTATGTTTATTGTGGCACTATTCACGAT 
AG CAAAGAAGGATGAGT T CATGTC CTT TG CAGGGACATGGATGAAG C TGGAAAC CAT CATT CTAAG CAAA 
CTATCACAAGGACAGAAAACCAAACACCACATGTTCTCACTCATAGGTGGGAGTTGAACAACGAGAACAC 
ATGGACACAGG 



In a search of public sequence databases, the NOV65 nucleic acid sequence, located on 
chromsome 13 has 2524 of 2878 bases (87%) identical to a gb:GENBANK- 
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ID:F2291 17S02|acc:AF2291 18.1 mRNA from Homo sapiens (Homo sapiens 
acetylcholinesterase collagen-like tail subunit (COLQ) gene, exons 1 A, 2, 3, 4, and 5). Public 
nucleotide databases include all GenBank databases and the GeneSeq patent database. 

The disclosed NOV65 polypeptide (SEQ ID NO: 1 54) encoded by SEQ ID NO: 1 53 has 
5 990 amino acid residues and is presented in Table 65B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV65 has no signal peptide and is 
likely to be localized in the cytoplasm with a certainty of 0.7000. 



Table 65B. Encoded NOV65 protein sequence (SEQ ID NO: 154). 

MKAEIKMFFETNENKDTMYQNLWDTFKAVCRGKFIALNAHKRKQERSKINILTSQLKELG 
KQEQTNSKASRRQKITKIRAELKEIETQKTLQKINESRSWFFEKINKIDRQLARPIKKKR 
EKNQIDATKNDKGDITTDPTEIQTTIREYYQHFYANILENLEEMDKFLDTYTLPRLNQEE 
VESLYRPITGSEIEAIINRLPTKKSPGPDGFTAEFYQRYKEELVPFLLKLFQTTEKEGLL 
PNS FYEAS 1 1 LTPKPGRDTTKKENFMP I SLMNIDAKVLNKI LAKAYPPHQKL IHHGQLGF 
IPGMQGWFNICKSINVVHHINRTNDKNHMIISIDAE 

LKIIRAIYDKPTANIIVNGQKLEAFPLKTGTRQGCPLSPLLFNIVLEVLARAIRQEKEIN 
GIQLGKEEVKLSLCADDMIVYLENPIVSAQNLLKLISNFSKVSGYKIOTQKSQAFLYINN 
RQTESQIMSELPFPITTKRIKYLGIQLTRDVKDLFKENYKPLLEIKEDTNKWKNIPCSWI 
GRINIVKIWILPKVIYRFNAIPIKLPLTFFTELEKTILKFIWNQKRTQIAKTILSKKNKA 
GGITLPDFKLYYKATVNKTAWYWYQNRYIDQWNRMEASEITPHIYNHLIFDKPDKNRQWG 
KDSLFNKWCWENWLAICRKLKLDPFLTPYTKINTRWIKDLNVRPNTIKTLEENLGNTIQD 
IGMGKVFMTKTPKAMATKVKIDKWDLIKLKSFCTAKETIIRVNRQPTEWEKIFATYPSDK 
GL I S R I YKELKQI YKKKKTTP S KSGQ I QEKKNNP I KKWAKDMSRHFS KED I YAANRMKKW 
SSSLVFREMQIKTTMRYHLMPWIWIIKKSGNNTCLRGCGEIGMLLHCWWECKLVQPLWK 
TVWRFLKDLEPEI PLDPAI PLLGI YPNNYKSCYYKDTRTRNFI VALFTI AKKDEFMS FAG 
TWMKLETI ILSKLSQGQKTKHHMFSLIGGS 



10 A search of sequence databases reveals that the NOV65 amino acid sequence has have 

842 of 95 1 amino acid residues (88%) identical to, and 867 of 95 1 amino acid residues (9 1 %) 
similar to, the 1010 amino acid residue patp:B38012 Human secreted protein encoded by gene 
3 clone HNHCT15. Public amino acid databases include the GenBank databases, SwissProt, 
PDB and PIR. 

1 5 The disclosed NOV65 polypeptide has homology to the amino acid sequences shown 

in the BLASTP data listed in Table 65C. 



Table 65C. BLAST results for NOV65 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


qi | 106322 |pir| |B340 


protein (L1H 3 ' 
region) - human 


1280 


885/1023 
(86%) 


911/1023 
(88%) 


0.0 


87 



365 



-ft iT% £*% HE* SS?- *# i^lf -3P3f SSB tSN fe"- 

*ai» m %*j> 3" <s %sj» s»*i> a-^ 



qi | 1798 044 7 | qb | AAL5 
0637. 1| (AF421375) 


unknown [Homo 
sapiens] 


1275 


870/1023 
(85%) 


904/1023 
(88%) 


0.0 


gi| 2 072 94 8 |gb|AAC51 
261. 1| (U93563) 


pi 5 0 [Homo 
sapiens] 


1275 


870/1023 
(85%) 


903/1023 
(88%) 


0.0 


gi | 2136112 | pir | |S65 
824 


reverse 
transcriptase 
homolog - human 
transposon LI . 1 


1275 


868/1023 
(84%) 


904/1023 
(87%) 


0 . 0 


qi | 5052951 ]gb|AAD3 8 
785.1|AF149422 2 
(AF149422) 


unknown 

[Homo sapiens] 


1275 


868/1023 
(84%) 


904/1023 
(87%) 


0.0 



Table 65D lists the domain descriptions from DOMAIN analysis results against 
NOV65. This indicates that the NOV65 sequence has properties similar to those of other 
proteins known to contain this domain. 

5 



Table 65D. Domain Analysis of NOV65 

gnl | P£am|pfa m0 00 78 , rvt, Reverse transcriptase (RNA- dependent DNA 
polymerase) . A reverse transcriptase gene is usually indicative of a 
mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including 
retrotransposons, retroviruses, group II introns, bacterial msDNAs, 
hepadnavi ruses, and caulimoviruses . 

CD-Length = 208 residues, 97.6% aligned 

Score = 108 bits (269) , Expect = 2e-24 



The disclosed NOV65 nucleic acid of the invention encoding a secreted protein-like 
protein includes the nucleic acid whose sequence is provided in Table 65 A or a fragment 
thereof The invention also includes a mutant or variant nucleic acid any of whose bases may 

1 0 be changed from the corresponding base shown in Table 65 A while still encoding a protein 

that maintains its secreted protein-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

1 5 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 

20 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 13 percent of the bases may be so changed. 
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The disclosed NOV65 protein of the invention includes the secreted protein-like 
protein whose sequence is provided in Table 65B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 65B while still encoding a protein that maintains its secreted protein-like activities 
5 and physiological functions, or a functional fragment thereof In the mutant or variant protein, 
up to about 12 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this secreted protein- 
10 like protein (NOV65) may function as a member of a "secreted protein family". Therefore, the 
NOV65 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV65 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
20 and disorders as indicated below. For example, a cDNA encoding the secreted protein-like 
protein (NOV65) may be useful in gene therapy, and the secreted protein-like protein 
(NOV65) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from CNS disorders, brain disorders including epilepsy, eating 
25 disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune disorders 
including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin 
disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; thalamus disorders; 
metabolic disorders including diabetes and obesity; lung diseases such as asthma, emphysema, 
cystic fibrosis, pancreatic disorders including pancreatic insufficiency and cancer; and 
30 prostate disorders including prostate cancer, or other pathologies or conditions. The NOV65 
nucleic acid encoding the secreted protein-like protein of the invention, or fragments thereof, 
may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 
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NOV65 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV65 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV65 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV66 

A disclosed NOV66 nucleic acid of 3 120 nucleotides (also referred to as CG57783- 
01) encoding a secreted protein-like protein is shown in Table 66A. The start and stop codons 
are in bold letters. 



Table 66 A. NOV66 nucleotide sequence (SEQ ID NO: 155), 



CTGAATGACTACTGGGTACATAACAAAATGAAGACAGAAATAAAGATGTTCTTTGAAACCAATGAGAACA 
AAGACACAACATACCAGAATCTCTGGGACACATTCAAAGCAGTGTGTAGAGGGAAATTTACAGCACTAAA 
TGCCCATAAGAGAAAGCAGGAAAGATCCAAAATTGACACCCTAACATCACAATTAAAACAACTACAGAAG 
CAAGAGCAAACACTTTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAACTGAAGGAGA 
TAGAGACACAAAAAACCCTTCAAAAAATCAATGAATCCAGGAGCTGGTTTTTTGAAAAGATCAACAAAAT 
TGATACACTGCTAGCAAGACTAATAAAGAAGAAAAGAGAGAAGAATCAAATAGACGCAATAAAAAATGAT 
AAAG C AG AT AT C AC CACTG ATC C C AC AGAAAT ACAAACT AC CAT CAGAG AAT ACT ATAAAC ACCT C T ATG 
CAAATAAACTAGAAAATCTAGAAGAAATGGATAAATTCCTTGACACATACACCCTCCCAAGAATAAACCA 
GGAAGAAGTTGAATCTCTGAATAGACCAATAACAGGCTCTGAAATTGAGG CAATAATTAATAG CTTACCA 
ACCAAAAAAAGT CCAGGAC CAGACGGAT TCACAG C CGAATT CTAC CAGAAG TACAAGGAGGAG C TGATAC 
CATT C CTT CTGAAACTATT C CAAT CAATAGAAAAAGAGGG AATC CTC C CTAACT CATTTG ATGAGG C CAG 
CAT CAT C CTG AT AC CAAAG C CTGGCAG AG AC AC AACAAAAAAAG AG AATTTTAGAC C AAT AT CT C TGATG 
AACATTGATGCAAAAATCCTGAATAAAATACTGGCAAACCGAATCAAGCA^ 

AC(^TGATCAAGTGGGCTTC^TCCCTGGGATGCAAGGCTGGTTCAACATATGCAAATCAAT7W^CGTAAT 
C CAG CATATAAACAGAACCAAAGACAAAAAC CACATGAT TAT CTCAATAGATG CAGAAAAGG C CTTTGAC 
AAAATT CAACAG CACTT CATG CTAAAAACT CTCAATAAAT TAGGTATTGATGGGACGTATCT CAAAATAA 
T AAG AG CT ATCTGTGAC AAAC C C ACTGC C AATAT CAT ACTG AATGGGCAAAAACTGGAAGCGTT C CC T T T 
GAAAACTGGCAC^GACAAGGGTGCCCTCTCTCACCACTCCTATTCAACATAGTGTTGGAAGTCCTGGCC 
AGGGCAATCAGGCAGGAGAAGGAAATAAAGGGTATTCAGTTAGGAAAAGAGGAAGTCAAATTGTCTCTGT 
TTG CAGATGACATGATTG TATAT CTAG AAAAC C C CAT CAT CT CAG C C CAAAAT C T C CT TAAG CTGATAAG 
CAACTT CAGCAAAGT CTCAGGATACAAAAT CGATGTG CAAAAATCACAAG CATT CT TATACACCAATACA 
GACCAGACAGAGAGCCAAATCATGAGTGACCTCCCATTCACAATTGCTTCAAAGAGAATAAAATACCTAG 
GAAT CCAACT TACAAGGGATG TGAAGGAC CT CTT CAAGG AGAACTACAAAC CAC TG C TCAATGAAATAAA 
AAAGGATACAAACAAATGGAAG AACATT C CAGG C TCATGGATAGG AAG AAT CAAT AT CG TGAAAATGG CC 
ATAGAGCCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAATGACTTTCTTCACAGAAC 
TGGAGAAAAC TACT T TAAAGT T CATATGGAAC CAAAAGAGAG CCCACAT TG C CAAGT CAAT C CTAAAC CA 
AAAGAACAAAGCTGGAGGCATCACACCACCTGACTTCAAACTATACTACAAGGCTACAGTAAACAAAACA 
G CATGGTAC TGG TAC CAAAACAGAG ATAT AG AC CAG TGG AAC AG AACAG AT C CC TC AG AAAT AATGC C AC 
ACATCTACAACTATCTGATCTTTGACAAACCTGACAAAAAGAAGCAATGGGGAAAGGATTCCCTATTTAA 
TAAATGGTGCTGGGAAAACTGGCTAGCCATAGGTAGAAAGCTGAAACTGGACCCCTTCCTTACACCTTAT 
ACAAAAATTAATTCAAGATGGATTAAAGACTTAAATGTTAGACCTAAAACCATAAAAACCCTAGAAGGAA 
ACCTAGGTATTAC CATTGAGGACACAGGCATGGGCAAGGACTTCATGTCTAAAACAC CAAAAGCAATGGC 
AAC AAAAG ACAAAATTG AC AAATGGG AT CT AATT AAACT AAAG AG CTT CTG CAC AG C AAAAGAAACTACC 
AT CAGAG TGAACAGG CAACCTACAAAATGGGAG AAAC T TT T TG CAACCTAT T CAT C TGACAAAGGG C TAA 
TATCCAGAATCTACAAAGAACTGAAACAAATTTACAAGAAAAGAACT^AA 

TAACAAC C C CAT CAAAAAG CGGG CAAAGGATATG AACAGACACT T C TCAAAAGAAGACATTTATG CAG CC 
AAAAG AC ACATGAAAAAATG CT CATC AT C ACTGG C CAT CAGAG AAATG CAAATGAAAAC CACAATGAG AT 
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ACCAT CT CACAC CAGTTAGAATGG CGAT CATT AAAAAGTCAGGAAACAACAGG TG CTGG AGAGGATG TGG 
AGAAATAGGAACACTTTTACGCTGTTGGTGGGACTGTAAACTAGTTCAACCATTGTGGAAGACAGTGTGG 
CGATTCCTCAGGGATCTAGAACTAGAAATACCATTTGACCCAGCCATCCCATTACTGGGTATATACCCAA 
AGG AT TATAAAT CATGCTG CTATAAAGACACATG CAGACG TATGTT TAT TG CGG CAC TAT T CACAATAG C 
AAAGACTTGGAACCAACCCAAATGTCCAACAATGATAGACTGGATTAAG7\AAATGTGGCACATATACACC 
ATG AAATAC TATG CAG CCAT AAAAAATG ATGAGT T CATG T C CT TTGTAGGGACATGGATGAAG C TGGAAA 
C CAT CATT C T CAG CAAAC TAT CACAAGG ACAG AAAACCAAACAC CACATG TT CT CAC T CATAGG TGGAAA 
TTGAACAATGAGAATACTTTGACACAGGAAGGGGAACATC 



In a search of public sequence databases, the NO V66 nucleic acid sequence, located on 
chromsome 13 has 2399 of 2567 bases (93%) identical to a gb:GENBANK- 
ID:HSNODlG2|acc:AF149774.1 mRNA from Homo sapiens (Homo sapiens NODI protein 
5 (NODI) gene, exons 4 through 14 and complete cds). Public nucleotide databases include all 
GenBank databases and the GeneSeq patent database. 

The disclosed NOV66 polypeptide (SEQ ID NO: 1 56) encoded by SEQ ID NO: 1 55 
has 1018 amino acid residues and is presented in Table 66B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV66 has no signal peptide and 
10 is likely to be localized in the cytoplasm with a certainty of 0.6000 



Table 66B. Encoded NOV66 protein sequence (SEQ ID NO:156). 

MKTEIKMFFETNENKDTTYQNLWDTFKAVCRGKFTALNAHKRKQERSKIDTLTSQLKQLQ 
KQEQTLS KAS RRQE I TKI RAELKE I ETQKTLQKINES RS WFFEKINKIDTLLARL I KKKR 
E KNQ IDA I KNDKAD I TTDPTE I QTT I RE YYKHLYANKLENLEEMDKFLDT YTLPR I NQEE 
VESLNRP ITGSE IEAI INSLPTKKS PGPDGFTAEFYQKYKEELI PFLLKLFQS IEKEG IL 
PNS FDE AS 1 1 L I PKPGRDTTKKENFRP I S LMNIDAKI LNKI LANR I KQH I KKL IHHDQVG 
FIPGMQGWFNICKSINVIQHINRTKDKNHMIISIDAEKAFDKIQQHFMLKTLNKLGIDGT 
YLKIIRAICDKPTANIILNGQKLEAFPLKTGTRQGCPLSPLLFNIVLEVLARAIRQEKEI 
KGIQLGKEEVKLSLFADDMIVYLENPIISAQNLLKLISNFSKVSGYKIDVQKSQAFLYTN 
TDQTESQ IMSDLPFT I ASKR I KYLG I QLTRDVKDLFKENYKPLLNE I KKDTNKWKNI PGS 
WIGRINIVKMAIEPKVIYRFNAIPIKLPMTFFTELEKTTLKFIWNQKRAHIAKSILNQKN 
KAGG I TPPDFKLYYKATVNKTAWYWYQNRD I DQWNRTD PS E I MPH I YNYLI FDKPDKKKQ 
WGKDSLFNKWCWENWLAIGRKLKLDPFLTPYTKINSRWIKDLNVRPKTIKTLEGNLGITI 
EDTGMGKDFMSKTPKJ^VIATKDKIDKWDLIKLKSFCTAKETTIRVNRQPTKWEKLFATYSS . 
DKGL I S R I YKE LKQ I YKKRTNNP I KKKTNNP I KKRAKDMNRHFS KED I YAAKRHMKKCS S 
S LA I REMQMKTTMR YHLTPVRMAI I KKS GNNRC WRGCGE I GTLLRC WWDCKLVQPLWKTV 
WRFLRDLELEIPFDPAIPLLGIYPKDYKSCCYKDTCRRMFIAALFTIAKTWNQPKCPTMI 
DW I KKMWH I YTMKYYAAI KNDE FMS FVGTWMKLET 1 1 L S KL S QGQKTKHHMF S L I GGN 



A search of sequence databases reveals that the NOV66 amino acid sequence has have 
947 of 1018 amino acid residues (93%) identical to, and 965 of 1018 amino acid residues 
(94%) similar to, the 1010 amino acid residue patp:B38012 protein from human (Human 
15 secreted protein (L1H 3' region)). Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

The disclosed NOV66 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 66C. 
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Table 66C. BLAST results for NOV66 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Ident i ty 
(%) 


Po 
sitives 
(%) 


Expect 


87 


piOtclIl ^ ±jXit j 

region) - human 


x z o u 


Qtrc: / t ni Q 
/ lUlo 

(93%) 


(95%) 


U . U 


gi| 20 72 958 |gb|AAC51 
267. 1| (U93567) 


putative p!50 
[Homo sapiens] 


1275 


945/1018 
(92%) 


971/1018 
(94%) 


0.0 


gi | 5052 951 | gb | AAD3 8 
785. 1 |AF149422 2 
(AF149422) 


unknown [Homo 
sapiens] 


1275 


948/1018 
(93%) 


972/1018 
(95%) 


0.0 


gi | 5 0 70622 | gb ( AAD3 9 
215 . 1 |AF148856 2 
(AF148856) 


unknown [Homo 
sapiens] 


1275 


945/1018 
(92%) 


973/1018 
(94%) 


0 .0 


gi|2072953 |gb|AAC51 
264. l| (U93565) 


pi 50 [Homo 
sapiens] 


1275 


943/1018 
(92%) 


971/1018 
(94%) 


0.0 



Table 66D lists the domain descriptions from DOMAIN analysis results against 
NOV66. This indicates that the NOV66 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 66D. Domain Analysis of NOV66 

gnl 1 Pf am | pf am00078 , rvt, Reverse transcriptase (RNA- dependent DNA 
polymerase) . A reverse transcriptase gene is usually indicative of a 
mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including 
retrotransposons, retroviruses, group II introns, bacterial msDNAs , 
hepadnavi ruses, and caulimoviruses . 

CD-Length = 208 residues, 93.8% aligned 

Score = 114 bits (285) , Expect = 3e-26 



The disclosed NOV66 nucleic acid of the invention encoding a secreted protein-like 
protein includes the nucleic acid whose sequence is provided in Table 66A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 66 A while still encoding a protein 
that maintains its secreted protein-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
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stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 7 percent of the bases may be so changed. 

The disclosed NOV66 protein of the invention includes the secreted protein-like 
5 protein whose sequence is provided in Table 66B. The invention also includes a mutant or 
variant protein any of whose residues may be changed from the corresponding residue shown 
in Table 66B while still encoding a protein that maintains its secreted protein-like activities 
and physiological functions, or a functional fragment thereof. In the mutant or variant protein, 
up to about 7 percent of the residues may be so changed. 

10 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this secreted protein- 
like protein (NOV66) may function as a member of a "secreted protein family". Therefore, the 
NOV66 nucleic acids and proteins identified here may be useful in potential therapeutic 

15 applications implicated in (but not limited to) various pathologies and disorders as indicated 
below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

20 and cell types composing (but not limited to) those defined here. 

The NOV66 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the secreted protein-like 
protein (NOV66) may be useful in gene therapy, and the secreted protein-like protein 

25 (NOV66) may be useful when administered to a subject in need thereof. By way of 
nonlimiting example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from CNS disorders, brain disorders including epilepsy, eating 
disorders, schizophrenia, ADD; cancer; heart disease; inflammation and autoimmune disorders 
including Crohn's disease, IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin 

30 disorders, blood disorders; psoriasis colon cancer, leukemia AIDS; thalamus disorders; 

metabolic disorders including diabetes and obesity; lung diseases such as asthma, emphysema, 
cystic fibrosis, pancreatic disorders including pancreatic insufficiency and cancer; and 
prostate disorders including prostate cancer, or other pathologies or conditions. The NOV66 
nucleic acid encoding the secreted protein-like protein of the invention, or fragments thereof, 
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may further be useful in diagnostic applications, wherein the presence or amount of the nucleic 
acid or the protein are to be assessed. 

NOV66 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV66 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV66 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV67 

NOV67 includes two acyltransferase-like proteins disclosed below. The disclosed 
sequences have been named NOV67a and NOV67b. 

NOV67a 

A disclosed NOV67a nucleic acid of 1 1 16 nucleotides (also referred to as CG57823- 
01) encoding a acyltransferase-like protein is shown in Table 67 A. The start and stop codons 
are in bold letters. 



Table 67A. NOV67a nucleotide sequence (SEQ ID NO:157). 

CCGGCACCGGCGTCAAGGCCATGGCGCTGTGCCCGGGATGGGTACACACCGAATTCCACT 
CACGCGCCAACGTCACCGGCAACCATCTGCCGGACTTTTTCTGGATCGACGCCGAAGTTC 
TGGTACGCGAGGCTCTCAACGACCTTGACCATGACAAGGTAGTATCCATTCCTACCCCGC 
TCTGGAAGTTCTTCATCGCAGTGGCCACACATACCCCACGTTCCGCTATGAGATTCCTGT 
CACGAACTCTGTCCTCGTCTCGAGACAAGGACGACCATCCTCGACACACTCCGGGAGGCG 
AGG C CTGAGA T GG C CAG CG T CAAAC C CAC TAAGGACCGGGG C CGG TACAC CAATGATCTG 
TCCGCCGCGACGCGGCAGGCAGCGAACATGCTTCTGCTGCGTCCTTTGGTGTGGAAAGTC 
GT CAAAG TG AG CGT C CACGG AG C CG ACAACCT CG ACGGG C T CG ACGG TGCTT ACGT CG C C 
GTCGCTAACCATTCCTCCCACCTCGACGCGCCGCTCGTTTTTGGGGCCCTTCCCAAGCGG 
CTGTCAAAGTACCTAGCTACCGGGGCCGCTGCTGACTATTTCTTCACCGCCTGGTGGAAG 
GCCATCGCTCCGGTGCTCTTCTTCAACGCGTTCCCGGTCGACCGAGGCAAAGGCAAAAGT 
AAGCAAGGTGCCCGTAGTCCCCGTTCCCACCGCGGTATGGCTGGGTCACTGCTGACAGAT 
GGCGTCCCCCTGCTGATCTTTCCGGAGGGCACCCGGTCTCGCACCGGCGCAATGGGCACC 
TTCAAACCTGGGGCTGCCGCATTGGCTATTTCACGTGGGGTTCCGGTTATCCCGATTGCT 
TTAGTAGGAGCATGGGCGGCTATGCCGTCCGAGCAAGCCAGGTTACCAAAAGGACGTCCA 
TTGGTCCACGTGGCTATTGGACACCCTATGGACCCTGTTCCCGGCGAGATCGCCCACCAA 
TTCTCCGAACGGATTCGTCGCCAGGTCATTGAGTTGCACGACCAAACCGCCCGCGCCTAC 
GGCATGCCAACCCTTGACGAATACGGACGCCACCGCGCGCTAAGCCAGGCCTCCGAGAGC 
GGCGACACCG CATC CAC CAAC CAC TCGACGTGACAC 



372 



<® xrn -tru & jst- *» #^ «s ur> ~*« ssa act- 

,cuUu Swift 

In a search of public sequence databases, the NOV67a nucleic acid sequence has 323 
of 558 bases (57%) identical to a gb:GENBANK-ID:AF263912|acc:AF263912T mRNA from 
Streptomyces noursei (Streptomyces noursei ATCC 1 1455 nystatin biosynthetic gene cluster, 
complete sequence). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV67a polypeptide (SEQ ID NO: 158) encoded by SEQ ID NO: 157 
has 267 amino acid residues and is presented in Table 67B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV 67a has a signal peptide and 
is likely to be localized at the ER plasma membrane with a certainty of 0.85000. The most 
likely cleavage site for a NOV 67a peptide is between amino acids 44 and 45. 

Table 67B. Encoded NOV67a protein sequence (SEQ ID NO:158 ). 

MASVKPTKDRGRYTNDLSAATRQAANMLLLRPLVWKWKVSVHGADNLDGLDGAYVAVAN ~ ~ 

HSSHLDAPLVFGALPKRLSKYLATGAAADYFFTAWWKAIAPVLFFNAFPVDRGKGKSKQG 
ARSPRSHRGMAGSLLTDGVPLLI FPEGTRSRTGAMGTFKPGAAALAI SRGVPVI P I ALVG 
AWAAMPSEQARLPKGRPLVHVAIGHPMDPVPGEIAHQFSERIRRQVIELHDQTARAYGMP 
TLDEYGRHRALSQASESGDTASTNHST 



A search of sequence databases reveals that the NOV67a amino acid sequence has 65 
of 181 amino acid residues (35%) identical to, and 96 of 181 amino acid residues (53%) 
similar to, the 240 amino acid residue ptnr:TREMBLNEW-ACC:CAC01452 protein from 
Streptomyces coelicolor (PUTATIVE ACYLTRANSFERASE). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV67a is expressed in at least Bone, Bone Marrow, Brain, Liver, Lung, Lymph 
node, Placenta, Prostate, Thalamus, Thyroid and Uterus. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

NOV67b 

A disclosed NOV67b nucleic acid of 906 nucleotides (also referred to as CG57823- 
02) encoding a acyltransferase-like protein is shown in Table 67C. The start and stop codons 
are in bold letters. 



Table 67C. NOV67b nucleotide sequence (SEQ ID NO:159). 

ATACCCCACGTTCCGCTATGAGATTCCTGTCACGAACTCTGTCCTCGTCTCGAGACAAGG ~ 

ACGACCACCCTCGACACACTCCGGGAGGCGAGGCCTGAG ATGGCCAGCGTCAAACCCACT 

AAGGACCGGGGCCGGTACACCAATGATCTGTCCGCCGCGACGCGGCAGGCAGCGAACATG 



si ajsau iljJ a^. 
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CTTCTGCTGCGTCCTTTGGTGTGGAAAGTCGTCAAAGTGAGCGTCCACGGAGCCGACAAC 
CTCGACGGGCTCGACGGTGCCTACGTCGCCGTCGCTAACCATTCCTCCCACCTCGACGCG 
CCGCTCGTTTTTGGGGCCCTTCCCAAGCGGCTGTCAAAGTACCTAGCTACCGGGGCCGCT 
GCTGACTATTTCTTCACCGCCTGGTGGAAGGCCATCGCTCCGGTGCTCTTCTTCAACGCG 
TTCCCGGTCGACCGAGGCAAAGGCAAAAGTAAGCAAGGTGCCCGTAGTCCCCGTTCCCAC 
CGCGGTATGGCTGGGTCACTGCTGACAGATGGCGTCCCCCTGCTGATCTTTCCGGAGGGC 
ACCCGGTCTCGCACCGGTGCAATGGGCACCTTCAAACCTGGGGCTGCCGCATTGGCTATT 
TCACGTGGGGTTCCGGTTATCCCGATTGCTTTAGTAGGAGCATGGGCGGCTATGCCGTCC 
GAGCAAGCCGGGTTACCAAAAGGACGCCCATCGGTCCACGTGGCTATTGGACACCCTATG 
GACCCTGTTCCCGGCGAGATCGCCCACCAATTCTCCGAACGGATTCGTCGCCAGGTCATT 
GAGTTGCACGACCAAACCGCCCGCGCCTACGGCATGCCAACCCTTGACGAATACGGACGC 
CACCGCGCGCTAAGCCAGGCCTCCGAGAGCGGCGACACCGCATCCACCAACCACTCGACG 
TGACAC 



In a search of public sequence databases, the NOV nucleic acid sequence has 323 of 
558 bases (57%) identical to a gb:GENBANK-ID:AF263912|acc:AF263912.1 mRNA from 
Streptomyces noursei (Streptomyces noursei ATCC 1 1455 nystatin biosynthetic gene cluster, 
complete sequence). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV67b polypeptide (SEQ ID NO: 1 60) encoded by SEQ ID NO: 1 59 
has 267 amino acid residues and is presented in Table 67D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV67b has a signal peptide and 
is likely to be localized at the ER plasma membrane with a certainty of 0.85000. The most 
likely cleavage site for a NOV67b peptide is between amino acids 44 and 45. 



Table 67D. Encoded NOV67b protein sequence (SEQ ID NO: 160). 



MASVKPTKDRGRYTNDLSAATRQAANMLLLRPLVWKWKVSVHGADNLDGLDGAYVAVAN 
HSSHLDAPLVFGALPKRLSKYLATGAAADYFFTAWWKAIAPVLFFNAFPVDRGKGKSKQG 
ARSPRSHRGMAGSLLTDGVPLLIFPEGTRSRTGAMGTFKPGAAALAI SRGVPVI P IALVG 
AWAAMPSEQAGLPKGRPSVHVAIGHPMDPVPGEIAHQFSERIRRQVIELHDQTARAYGMP 
TLDEYGRHRALSQASESGDTASTNHST 



A search of sequence databases reveals that the NOV67b amino acid sequence has 65 
of 181 amino acid residues (35%) identical to, and 96 of 181 amino acid residues (53%) 
similar to, the 240 amino acid residue ptnr:TREMBLNEW-ACC:CAC01452 protein from 
Streptomyces coelicolor (PUTATIVE ACYLTRANSFERASE). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV67b is expressed in at least Bone, Bone Marrow, Brain, Liver, Lung, Lymph 
node, Placenta, Prostate, Thalamus, Thyroid, Uterus. This information was derived by 
determining the tissue sources of the sequences that were included in the invention including 
but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE 
sources. 

The disclosed NOV67 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 67E. 
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Table 67E. BLAST results for NOV67 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 15644441 |ref|NP 
229493 .1 | 
(NC_000853) 


1-acyl-sn- 
glycerol-3 ~ 
phosphate 
acetyl transferase 
, putative 
[Thermotoga 
maritima] 


247 


64/214 
(29%) 


100/214 
(45%) 


3e-14 


gi| 9716114 |emb|CACO 
1452. 1| (AL391014) 


putative 
acyl transferase 
[Streptomyces 
coelicolor A3 (2)] 


240 


63/184 
(34%) 


92/184 
(49%) 


e-12 


gi | 1169312 0 |gb| AAG3 
8841 . 1 | (AY010120) 


putative acetyl 
transferase 
[Xanthomonas 
oryzae pv . 
oryzae] 


249 


68/203 
(33%) 


94/203 
(45%) 


4e-12 


gi| 15.607028 |ref | NP 
214410. 1| 
<NC_000918) 


2- 

acylglycerophosph 
oethanolamine 
acyl transferase 
[Aquif ex 
aeolicus] 


211 


59/199 
(29%) 


92/199 
(45%) 


e-11 


gi | 15606303 |ref |NP 
213682. l| 
(NC 000918) 


long-chain- fatty- 
acid CoA ligase 
[Aquif ex 
aeolicus] 


823 


47/149 
(31%) 


72/149 
(47%) 


3e-10 



Tables 67F-G list the domain descriptions from DOMAIN analysis results against 
NOV67. This indicates that the NOV67 sequence has properties similar to those of other 
5 proteins known to contain this domain. 



Table 67F. Domain Analysis of NOV67 

gnl|Pfam[pfam01 553 , Acyltransferase, Acyltransferase. This family 
contains acyltransferases involved in phospholipid biosynthesis and other 
proteins. This family also includes tafazzin, the Barth syndrome gene. 
CD-Length = 185 residues, 95.7% aligned 

Score a 92,4 bits (228), Expect = 3e-20 
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Table 67G. Domain Analysis of NOV67 

gnl|Smart|smart00563, PlsC, Phosphate acyltransferases; Function in 
phospholipid biosynthesis and have either glycerophosphate, 1 - 
acylglycerolphosphate, or 2-acylglycerolphosphoethanolamine acyltransferase 
activities. Tafazzin, the product of the gene mutated in patients with Barth 
syndrome, is a member of this family. 

CD-Length = 118 residues, 99.2% aligned 

Score = 81.6 bits (200), Expect = 5e-17 



The polyene macrolide antibiotic nystatin produced by Streptomyces noursei ATCC 
1 1455 is an important antifungal agent. The nystatin molecule contains a polyketide moiety 
represented by a 38-membered macrolactone ring to which the deoxysugar mycosamine is 
attached. Molecular cloning and characterization of the genes governing the nystatin 
biosynthesis is of considerable interest because this information can be used for the generation 
of new antifungal antibiotics. 

Glycerol-3 -phosphate 1 -acyltransferase (E.C. 2.3.1.15; G3PAT) catalyses the 
incorporation of an acyl group from either acyl-acyl carrier proteins (acylACPs) or acylCoAs 
into the sn-1 position of glycerol 3-phosphate to yield 1 -acylglycerol 3-phosphate. Crystals of 
squash G3PAT have been obtained by the hanging-drop method of vapour diffusion using 
PEG 4000 as the precipitant. These crystals are most likely to belong to space group 
P2( 1)2(1)2(1), with approximate unit-cell parameters a = 61.1,b = 65.1,c = 103.3 A, alpha = 
beta = gamma = 90 degrees and a monomer in the asymmetric unit. X-ray diffraction data to 
1.9 A resolution have been collected in-house using a MAR 345 irnaging-plate system. PMID: 
11223529. 

Exposure of heparinized human plasma to gas-phase cigarette smoke produced a dose- 
dependent reduction in the activity of platelet-activating factor acetylhydrolase (PAF-AH). 
Reductions of nearly 50% in PAF-AH activity were observed following exposure to gas-phase 
smoke from four cigarettes over an 8-h period. During this time of exposure, 
lecithimcholesterol acyltransferase (LCAT) was rendered almost completely inactive (>80%). 
In contrast, paraoxonase was totally unaffected by cigarette smoke. Supplementation of 
plasma with 1 mM reduced glutathione was found to protect both PAF-AH and LCAT from 
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cigarette smoke, suggesting that cysteine modifications may have contributed to the inhibition 
of these two enzymes. 

Although the atheroprotective role of high-density lipoprotein (HDL) has been well 
documented in epidemiological and animal studies, highly effective therapeutic approaches for 
5 the selective increase of plasma HDL levels or function are not yet available. Several 

mechanisms by which HDL exerts an atheroprotective effect have been proposed on the basis 
of experiments in vitro and in vivo. These mechanisms include directing excess cellular 
cholesterol from the peripheral tissues to the liver in 'reverse cholesterol transport', inhibiting 
oxidative modification or aggregation of LDL, and modulating inflammatory responses to 
10 favour vasoprotection. High density lipoproteins (HDL) mediate reverse cholesterol transport 
as well as the clearance of oxidation 

The disclosed NOV67 nucleic acid of the invention encoding a acyltransferase-like 
protein includes the nucleic acid whose sequence is provided in Table 67 A or a fragment 
thereof The invention also includes a mutant or variant nucleic acid any of whose bases may 

15 be changed from the corresponding base shown in Table 67 A while still encoding a protein 
that maintains its acyltransferase-like activities and physiological functions, or a fragment of 
such a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

20 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 
modified bases, and nucleic acids whose sugar phosphate backbones are modified or 
derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 

25 binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 43 percent of the bases may be so changed. 

The disclosed NOV67 protein of the invention includes the acyltransferase-like protein 
whose sequence is provided in Table 67B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 

30 B while still encoding a protein that maintains its acyltransferase-like activities and 

physiological functions, or a functional fragment thereof In the mutant or variant protein, up 
to about 65 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2,that bind immunospecifically to any of the proteins of the invention. 
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The above defined information for this invention suggests that this acyltransferase-like 
protein (NOV67) may function as a member of a "acyltransferase family". Therefore, the 
NOV67 nucleic acids and proteins identified here may be useful in potential therapeutic 
applications implicated in (but not limited to) various pathologies and disorders as indicated 
5 below. The potential therapeutic applications for this invention include, but are not limited to: 
protein therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here, 

10 The NOV67 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the acyltransferase-like 
protein (NOV67) may be useful in gene therapy, and the acyltransferase-like protein (NOV67) 
may be useful when administered to a subject in need thereof. By way of nonlimiting 

15 example, the compositions of the present invention will have efficacy for treatment of patients 
suffering from Osteoporosis,Hypercalceimia, Arthritis, Ankylosing spondylitis, Scoliosis , 
Hemophilia, hypercoagulationjdiopathic thrombocytopenic purpura, autoimmume 
disease,allergies, immunodeficiencies,transplantation, Graft vesus host, Von Hippel-Lindau 
(VHL) syndrome, Cirrhosis,Transplantation , Lymphedema , Allergies , Hemophilia, 

20 hypercoagulationjdiopathic thrombocytopenic purpura, immunodeficiencies , Fertility , 
Osteoporosis,Hypercalceimia, Arthritis, Ankylosing spondylitis, Scoliosis , Von Hippel- 
Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, hypercalcemia, 
Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy,Lesch-Nyhan syndrome, 
Multiple sclerosis,Ataxia-telangiectasia,Leukodystrophies,Behavioral disorders, Addiction, 

25 Anxiety, Pain, Neuroprotection, or other pathologies or conditions. The NOV67 nucleic acid 
encoding the acyltransferase-like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 

NOV67 nucleic acids and polypeptides are further useful in the generation of 
30 antibodies that bind immuno-specifically to the novel NOV67 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV67 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 

378 



in* m* vers set- «u- n- m u- ^ » an* ^ 

-W ?L>3) &JP '3" SUi» JS 1S««. *J«4» a,- 



assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV68 

A disclosed NOV68 nucleic acid of 1388 nucleotides (also referred to as CG57801- 
01) encoding a guanine nucleotide exchange factor-like protein is shown in Table 68 A. The 
start and stop codons are in bold letters. 



Table 68A. NOV68 nucleotide sequence (SEQ ID NO:161). 



AACAGGTCACAGGAAAGTAGGAACCTGGGAGGTCTCCTGCCTCTGCTCCA TGCTGCCCTC 
CCCTGCCCCAAGTCACCTGTCCCCTGTATGTGGGTTGCAGTTGCGAGTGAATCAGGAAGA 
GCTGTCGGAAAACTCCAGCAGCACCCCCAGTGAGGAGCAGGACGAGGAGGCCAGCCAGAG 
C CGC C ACAGACACTGTGAGAAC AAG CAGC AG ATGCGGAC CAACGT CAT CCGGGAG ATCAT 
GGACACCGAGCGGGTGTACATCAAACACCTCAGGGACATCTGTGAGGGCTATATCCGACA 
GTGCCGCAAGCACACAGGAATGTTCACCGTTGCGCAGCTAGCCACTATTTTTGGAAACAT 
TGAAGATATTTACAAAT TC CAAAGAAAG TTT CTGAAAGAC CT TGAGAAACAGTACAACAA 
AGAGGAACCTCACTTAAGTGAAATAGGATCTTGCTTTCTTCAAAATCAAGAGGGCTTTGC 
CATCTATTCCGAGTACTGCAACAACCACCCGGGCGCCTGCCTGGAGCTCGCCAACCTCAT 
GAAGCAGGGCAAGTACAGACATTTCTTTGAAGCCTGCCGCCTGCTGCAGCAGATGATTGA 
CATCGCCATCGACGGGTTCCTGCTCACACCAGTGCAGAAGATCTGCAAATACCCGCTGCA 
GCTGGCCGAGCTGCTCAAGTATACCACACAGGAACACAGTGATTACAGCAACATAAAGGC 
AGCATATGAGGCCATGAAGAATGTGGCCTGTCTGATCAACGAGCGCAAGCGCAAGCTGGA 
GAGCATCGACAAGATAGCTCGCTGGCAGGTGTCTATCGTGGGCTGGGAGGGACTGGATAT 
CT T AGAC CGAAG CT CAGAAT TG ATTCATT CTGGGGAG CTGACCAAAAT CACTAAGC AAGG 
CAAAAGCCAGCAGCGGACGTTCTTCCTGTTTGACCACCAGCTGGTGTCCTGCAAGAAGGA 
CCTGCTGCGCAGGGACATGCTGTACTACAAGGGCCGGCTGGACATGGATGAGATGGAGCT 
TGTGGACCTGGGGGATGGGCGCGACAAGGACTGCAACCTCAGCGTGAAAAATGCCTTCAA 
G C TCGTCAG TAGGAC CACAG ACGAGG TTTATTTG TTT TGTG C CAAAAAACAAGAAGACAA 
GGCGAGGTGGCTGCAGGCCTGTGCAGATGAAAGGAGGCGGGTGCAAGAGGACAAGGAGAT 
GGGAATGGAAAT T TC AGAAAACCAGAAGAAAC TTGCC ATGT T AAATGCT C AAAAGGC AGG 
ACATGGAAAGTCAAAAGGTAAGTTATGGAGAAGGCTTTGTCCCCTTAATGCTTATCAGTA 
TTCTCCTGAAAATGGGAGCATACCCCAAGTTGTCAGCCTGTG ACCAGCTTGGAGCAAGGA 
GAACAGTA 



In a search of public sequence databases, the NOV68 nucleic acid sequence, located on 
chromsome 13 has has 822 of 1 143 bases (71%) identical to a gb:GENBANK- 
ID:AB029035|acc:AB029035.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
KIAA1 1 12 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV68 polypeptide (SEQ ID NO: 162) encoded by SEQ ID NO: 161 
has 437 amino acid residues and is presented in Table 68B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV68 has localized in the 
cytoplasm with a certainty of 0.3000. 



Table 68B. Encoded NOV68 protein sequence (SEQ ID NO: 162). 

MLPSPAPSHLSPVCGLQLRVNQEELSENSSSTPSEEQDEEASQSRHRHCENKQQMRTNVI 
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RE I MDTERVY I KHLRD I CEG Y I RQCRKHTGMFTVAQLAT I FGNI ED I YKFQRKFLKDLEK 
Q YNKEEPHLS E I G S C FLQNQEG FA I YS E YCNNHPGACLELANLMKQGKYRHFFEACRLLQ 
QMIDIAIDGFLLTPVQKICKYPLQLAELLKYTTQEHSDYSNIKAAYEAMKNVACLINERK 
RKLESIDKIARWQVSIVGWEGLDILDRSSELIHSGELTKITKQGKSQQRTFFLFDHQLVS 
CKKDLLRRDMLYYKGRLDMDEMELVDLGDGRDKDCNLSVKNAFKLVSRTTDEVYLFCAKK 
QEDKARWLQACADERRRVQEDKEMGMEISENQKKLAMLNAQKAGHGKSKGKLWRRLCPLN 
AYQYS PENGS I PQWS L 



A search of sequence databases reveals that the NOV68 amino acid sequence has 253 
of 402 amino acid residues (62%) identical to, and 3 1 7 of 402 amino acid residues (78%) 
similar to, the 493 amino acid residue ptnr:SPTR£MBL-ACC:Q9QX73 protein from Rattus 
5 norvegicus (Rat) (COLLYBISTIN I). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

NOV68 is expressed in at least Kidney, Pituitary Gland, Placenta, Uterus, Aorta, 
Hypothalamus, Pancreas, Spleen, Epidermis, Muscle, Spinal Cord. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
10 including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV68 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 68C. 



Table 68C. BLAST results for NOV68 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l8581232|ref | XP 


similar to Rho 
guanine 
nucleotide 
exchange factor 
4, isoform a; 
APC- stimulated 
guanine 
nucleotide 
exchange factor 
[Homo sapiens] 


652 


393/399 
(98%) 


395/399 
(98%) 


0.0 


062774 .2 | 
(XMJD62774) 


gi|8809845|gb|AAF79 


RhoGEF [Homo 
sapiens] 


720 


256/429 
(59%) 


319/429 
(73%) 


e-141 


955.1|AF249745 1 


(AF249745) 


gi| 5689561 | db j |BAA8 


KIAA1112 protein 
[Homo sapiens] 


694 


250/402 
(62%) 


311/402 
(77%) 


e-140 


3064.1) (AB029035) 


gi|l3027402|ref |NP 


collybistin I 
[Rattus 
norvegicus] 


493 


254/407 
(62%) 


318/407 
(77%) 


e-140 


076447.1) 
(NM 023957) 


gi|7662108|ref |NP 0 


Cdc4 2 guanine 
exchange factor 
9; Cdc42 guanine 
exchange factor 
(GEF) 9 [Homo 
sapiens] 


516 


250/397 
(62%) 


312/397 
(77%) 


e-140 


56000.ll 
(NM_015185) 



15 
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Table 68D-E lists the domain descriptions from DOMAIN analysis results against 
NOV68. This indicates that the NOV68 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 68D. Domain Analysis of NOV68 

gnl | Smart 1 smart 0 03 25 , RhoGEF, Guanine nucleotide exchange factor for 
Rho/Rac/Cdc42-like GTPases; Guanine nucleotide exchange factor for 
Rho/Rac/Cdc42-like GTPases Also called Dbl -homologous (DH) domain. It 
appears that PH domains invariably occur C- terminal to RhoGEF/DH 
domains . Improved coverage . 

CD-Length = 181 residues, 100.0% aligned 

Score = 156 bits (394), Expect = 3e-39 

Table 68E. Domain Analysis of NOV68 

gnl | Smart ] smart00233 , PH, Pleckstrin homology domain.; Domain commonly- 
found in eukaryotic signalling proteins. The domain family possesses 
multiple functions including the abilities to bind inositol 
phosphates, and various proteins. PH domains have been found to 
possess inserted domains (such as in PLC gamma, syntrophins) and to be 
inserted within other domains. Mutations in Brutons tyrosine kinase 
(Btk) within its PH domain cause X- linked agammaglobulinaemia (XLA) in 
patients. Point mutations cluster into the positively charged end of 
the molecule around the predicted binding site for 
phosphatidyl inositol lipids. 

CD-Length = 104 residues, 92.3% aligned 
Score = 56.2 bits (134), Expect = 4e-09 



The novel protein described in this application belongs to the guanine nucleotide 
exchange factor family of proteins which play a significant role in signal transduction. The 
guanine nucleotide exchange factor (GEF) domain that regulates GTP binding protein 

10 signaling. The GEF domain regulates positively the signaling cascades that utilize GTP- 

binding proteins (such as those of the ras superfamily) that function as molecular switches in 
fundamental events such as signal transduction, cytoskeleton dynamics and intracellular 
trafficking. Experiments have shown that the GEF and (PH) domains of FGD1 (faciogenital 
dyplasia protein (FGD1) ) can bind specifically to the Rho family GTPase Cdc42Hs and 

15 stimulates the GDP-GTP exchange of the isoprenylated form of Cdc42Hs. The GEF domain of 
FGD1 has also been shown to activate 2 kinases involved in cell proliferation; the Jun NH2- 
terminal kinase and the p70 S6 kinase (See Zheng et. al.; J. Biol. Chem 1996 Dec 
27;271(52):33 169-72). Thus this novel protein may play an important role in normal 
development as well as disease. This class of molecules (GEFs) is also being considered as a 

20 good drug target as the guanine nucleotide exchange factor RasGRP is a high -affinity target 
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for diacyl glycerol and phorbol esters and is bound by bryostatin 1 , a compound currently in 
clinical trials (See Lorenzo et. ah; Mol. Pharmacol 2000 May;57(5):840-6). Collybistin I and 
II, which belong to the family of dbl-like GDP/GTP exchange factors (GEFs) are most 
homologous to the protein described in this application. Collybistin II regulates the membrane 
5 deposition of gephyrin (an integral membrane protein) by activating a GTPase of the Rho/Rac 
family and may be an important determinant of inhibitory postsynaptic membrane formation 
and plasticity (See Kins et. al. Nat. Neurosci 2000 Jan;3(l):22-9). 

The disclosed NOV68 nucleic acid of the invention encoding a guanine nucleotide 
exchange factor-like protein includes the nucleic acid whose sequence is provided in Table 

10 68A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 68A while still 
encoding a protein that maintains its guanine nucleotide exchange factor-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 

1 5 acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 

20 chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 29 percent of the bases may be so 
changed. 

The disclosed NOV68 protein of the invention includes the guanine nucleotide 
25 exchange factor-like protein whose sequence is provided in Table 68B. The invention also 
includes a mutant or variant protein any of whose residues may be changed from the 
corresponding residue shown in Table 68B while still encoding a protein that maintains its 
guanine nucleotide exchange factor-like activities and physiological functions, or a functional 
fragment thereof. In the mutant or variant protein, up to about 38 percent of the residues may 
30 be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this guanine nucleotide 
exchange factor-like protein (NOV68) may function as a member of a "guanine nucleotide 
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exchange factor family". Therefore, the NOV68 nucleic acids and proteins identified here may 
be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
5 antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV68 nucleic acids and proteins of the invention are useful in potential 

10 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the guanine nucleotide 
exchange factor-like protein (NOV68) may be useful in gene therapy, and the guanine 
nucleotide exchange factor-like protein (NOV68) may be useful when administered to a 
subject in need thereof. By way of nonlimiting example, the compositions of the present 

1 5 invention will have efficacy for treatment of patients suffering from cancer,trauma, 

regeneration (in vitro and in vivo), viral/bacterial/parasitic infections, diabetes, autoimmune 
disease, renal artery stenosis, interstitial nephritis, glomerulonephritis, polycystic kidney 
disease, systemic lupus erythematosus, renal tubular acidosis, IgA nephropathy, 
hypercalceimia, Lesch-Nyhan syndrome, Von Hippel-Lindau (VHL) syndrome, Alzheimer's 

20 disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, 
cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, ataxia-telangiectasia, 
leukodystrophies, behavioral disorders, addiction, anxiety, pain, neurodegeneration, muscular 
dystrophy, myasthenia gravis, atherosclerosis, aneurysm, hypertension, fibromuscular 
dysplasia, stroke, scleroderma, obesity, transplantation, or other pathologies or conditions. The 

25 NOV68 nucleic acid encoding the guanine nucleotide exchange factor-like protein of the 

invention, or fragments thereof, may further be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. 

NOV68 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV68 substances for use in 
30 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV68 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
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understanding of pathology of the disease and development of new drag targets for various 
disorders. 

NOV69 

NOV69 includes two aspartate aminotransferase-like proteins disclosed below. The 
disclosed sequences have been named NOV69a and NOV69b. 

NOV69a 

A disclosed NOV60a nucleic acid of 1463 nucleotides (also referred to as CG57719- 
01) encoding a aspartate aminotransferase-like protein is shown in Table 69 A. The start and 
stop codons are in bold letters. 



Table 69A. NOV69a nucleotide sequence (SEQ ID NO: 163). 



GGAAG AC TT C TGGG CAG AAG CGGAACACAGGAG CAGAGACACATAGT CTTGG CT C CAG TT 
TCGTTTCAGTTA TGCCCACCCTTTCAGTGTTCATGGATGTGCCCCTCGCCCACAAGCTAG 
AGGG CAG C TTGT T AAAGAC CTACAAACAAGATGAT TAC C CG AACAAGATAT T C TTAG C C T 
ATAG AGG CAC CTTC C CAC AG C C CCATGG AG T C CAGGAG AGAT T TGT TTG CAGG CTGTC TG 
CAGAGCTCAGCCCTGGGGGCCCAAACCAGGCATCTGGAGCTCCCTCTGTGGTTTTCCTCA 
CAGTCTGCATGACAAATGAAGGCCATCCCTGGGTTTCTCTCGTGGTGCAGAAGACTCGAC 
TACAGATTTCACAGGATCCCTCCCTGAATTATGAGTACTTGCCCACCATGGGCCTGAAAT 
CATT CAT CCAGG CCT C T CTAG CAC TC CT CTT TGG AAAG CACAG C CAAG C C ATTGTGG AG A 
ACAGGGTAGGGGGTGTACACACTGTTGGTGACAGTGGTGCCTTCCAGCTTGGCGTCCAGT 
TTCTCAGAGCTTGGCATAAGGATGCTCGTATAGTTTACATCATCTCTTCTCAAAAAGTTC 
CCACAGAACTGCATGGACTCGTCTTCCAGGACATGGGCTTTACAGTTTATGAATACTCTG 
TCTGGG ACC C C AAG AAG CT ATG CATGG AC C C CG AC AT AC T CCT C AATG TGGTGG AG CAG A 
TCCCACATGGCTGTGTCCTTGTGATGGGGAACATTATCGACTGCAAGTTGACACCAAGTG 
GG TGGG CAAAGT TG ATG T C C ATG AT AAAG AG CAAG C AG AT ATT C CC ATTTTTTG AT ATTC 
CCTGT CAAGGTTTATACACCAGTGACTTGGAAGAAGATACTAGAAT CTTACAATACTTTG 
TGTCTCAAGGCTTTGAGTTCTTCTGCAGCCAGTCTCTGTCCAAAAATTTTGGCATTTATG 
ATGAAGGAGTGGGGATGCTAGTGGTGGTGGCAGTCAACAACCAGCAGCTGCTGTGTGTCC 
TCTCCCAGCTGGAAGGATTAGCCCAGGCCCTGTGGCTAAACCCCCCCAACACGGGTGCAC 
GTGTCATCACCTCCATCCTCTGCAACCCTGCTCTGCTGGGAGAATGGAAGCAGAGTCTAA 
AAG AAGTTGTAGAGAACAT CATGCTAACCAAGGAAAAAGTGAAGGAGAAACTC CAG CT C C 
TGGGAACCCCTGGGTCCTGGGGTCACATCACCGAGCAGAGTGGGACCCACGGCTATCTTG 
GACT CAACTGTAAG CAGGTGGAATAC CTGGT CAGG AAG AAG C ACAT CTATATCC C CAAG A 
ACGGT CAGATTAAC TT CAG CTG TATCAATGC CAACAACATAAATTACAT CACTGAGGG CA 
TCAATG AGG CTG TC C T C CT CACAGAGAG C TCAGAGATG TGT CTT CCAAAGGAAAAAAAAA 
CACTGATTGGAATAAAACTTTAG 



In a search of public sequence databases, the NOV69a nucleic acid sequence, located 
on chromsome 8 has 316 of 327 bases (96%) identical to a gb.GENBANK- 
ID:AP000501|acc:AP000501.1 mRNA from Homo sapiens (Homo sapiens genomic DNA, 
chromosome 8pl 1 .2, clone:91h23 to 9-41). Public nucleotide databases include all GenBank 
databases and the GeneSeq patent database. 

The disclosed NOV69a polypeptide (SEQ ID NO: 1 64) encoded by SEQ ID NO: 1 63 
has 463 amino acid residues and is presented in Table 69B using the one-letter amino acid 
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code. Signal P, Psort and/or Hydropathy results predict that NOV69a has no signal peptide 
and is likely to be localized in the cytoplasm with a certainty of 0.3696. 



Table 69B. Encoded NOV69a protein sequence (SEQ ID NO:164). 

MPTLSVFMDVPLAHKLEGSLLKTYKQDDYPNKIFLAYRGTFPQPHGVQERFVCRLSAELS 
PGGPNQASGAPSWFLTVCMTNEGHPWVSLWQKTRLQISQDPSLNYEYLPTMGLKSFIQ 
ASI^LLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAVmKX>ARIVYIISSQKVPTEL 
HGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNWEQIPHGCVLVMGNIIDCKLTPSGWAK 
LMSMIKSKQIFPFFDIPCQGLYTSDLEEDTRILQYFVSQGFEFFCSQSLSKNFGIYDEGV 
GMLVWAVNNQQLLCVLSQLEGLAQALWLNPPNTGARVITSILCNPALLGEWKQSLKEW 
ENIMLTKEKVKEKLQLLGTPGSWGHITEQSGTHGYLGLNCKQVEYLVRKKHIYIPKNGQI 
NFSCINANNINYITEGINEAVLLTESSEMCLPKEKKTLIGIKL 



A search of sequence databases reveals that the NOV69a amino acid sequence has 163 
of 228 amino acid residues (71%) identical to, and 187 of 228 amino acid residues (82%) 
similar to, the 264 amino acid residue ptnr:TREMBLNEW-ACC:BAB24820 protein from 
Mus musculus (Mouse) (ADULT MALE TESTIS CDNA, R1KEN FULL-LENGTH 
ENRICHED LIBRARY, CLONE: 1700083M1 1, FULL INSERT SEQUENCE). Public amino 
acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV69a is expressed in at least testis. This information was derived by determining 
the tissue sources of the sequences that were included in the invention including but not 
limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 

NOV69b 

A disclosed NOV69b nucleic acid of 1280 nucleotides (also referred to as CG57719- 
02) encoding a aspartate aminotransferase-like protein is shown in Table 69C. The start and 
stop codons are in bold letters. 



Table 69C. NOV69b nucleotide sequence (SEQ ID NO:165). 

CCACCCTTTCAGTT ATGCCCACCCTTTCAGTGTTCATGGATGTGCCCCTCGCCCACAAGC 
TAGAGGG CAGCTTGTTAAAGACCTACAAACAAGATGATTAC CCGAACAAGATATT CTTAG 
CCTATAGAGTCTGCATGACAAATGAAGGCCATCCCTGGGTTTCTCTCGTGGTGCAGAAGA 
CTCGACTACAGATTTCACAGGATCCCTCCCTGAATTATGAGTACTTGCCCACCATGGGCC 
TGAAATCATTCATCCAGGCCTCTCTAGCACTCCTCTTTGGAAAGCACAGCCAAGCCATTG 
TGGAGAACAGGGCAGGGGGTGTACACACTGTTGGTGACAGTGGTGCCTTCCAGCTTGGCG 
T C CAGTTTCTCAGAG CTTGGCATAAGGATG CT CGTATAGTTTACAT CAT CT CTT CT CAAA 
AAGAACTGCATGGACTCGTCTTCCAGGACATGGGCTTTACAGTTTATGAATACTCTGTCT 
GGGAC C C CAAGAAGC TATG CATGGACC C CGACATAC TC CT CAATG TGGTGGAG CAGATC C 
CACATGGCTGTGTCCTTGTGATGGGGAACATTATCGACTGCAAGTTGACACCAAGTGGGT 
GGG CAAAGTTGATGTCCATGATAAAGAGCAAGCAGATATTCCCATTTTTTGATATTCCCT 
GTCAAGGTTTATACACCAGTGACTTGGAAGAAGATACTAGAATCTTACAATACTTTGTGT 
CTCAAGGCTTTGAGTTCTTCTGCAGCCAGTCTCTGTCCAAGAATTTTGGCATTTATGATG 
AAGGAGTGGGGATGCTAGTGGTGGTGGCAGTCAACAACCAGCAGCTGCTGTGTGTCCTCT 
CCCAGCTGGAAGGATTAGCCCAGGCCCTATGGCTAAACCCCCCCAACACGGGTGCACGTG 
TCATCACCTCCATCCTCTGCAACCCTGCTCTGCTGGGAGAATGGAAGCAGAGTCTAAAAG 
AAGTTGTAGAGAACATCATGCTAACCAAGGAAAAAGTGAAGGAGAAACTCCAGCTCCTGG 
GAACCCCTGGGTCCTGGGGTCACATCACCGAGCAGAGTGGGACCCACGGCTATCTTGGAC 
T CAAC TC CCAG CAGG TGGAATAC C TGG T CAGGAAG AAG CACAT CTATAT CC CAAAG AACG 
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GTCAGATTAACTTCAGCTGTATCAATGCCAACAACATAAATTACATCACTGAGGGCATCA 
ATGAGGCTGTCCTCCTCACAGAGAGCTCAGAGATGTGTCTTCCAAAGGAAAAAAAAACAC 
TGATTGGAATAAAACTTTAG 

In a search of public sequence databases, the NOV69b nucleic acid sequence, located 
on chromsome 8 has has 401 of 620 bases (64%) identical to a gb:GENBANK- 
ID:RATCASPAT|acc:D00252.1 mRNA from Rattus norvegicus (Rattus norvegicus mRNA 
for cytosolic aspartate aminotransferase, complete cds). Public nucleotide databases include 
all GenBank databases and the GeneSeq patent database. 

The disclosed NOV69b polypeptide (SEQ ID NO: 166) encoded by SEQ ID NO: 165 
has 421 amino acid residues and is presented in Table 69D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV69b has no signal peptide 
and is likely to be localized in the cytoplasm with a certainty of 0.3645. 



Table 69D. Encoded NOV69b protein sequence (SEQ ID NO:166). 

MPTLSVFMDVPLJMSKLEGSLLKTYKQDDYPNKIFLAYRVCMTNEGHPWVSLVVQKTRLQI 
S QDP S LNYE YLPTMGLKS F I Q AS LALLFGKH S QAI VENRAGGVHTVGD S G AFQLGVQFLR 
AWHKDARIVYI I S S QKELHGLVFQDMGFTVYE YS VWDPKKLCMDPD I LLNWEQI PHGCV 
LVMGNIIDCKLTPSGWAKLMSMIKSKQIFPFFDIPCQGLYTSDLEEDTRILQYFVSQGFE 
FFCS QS L S KNFG I YDEGVGML VVVAVNNQQLLCVLSQLEGLAQAL WLNP PNTGARVI TS I 
LCNPALLGEWKQSLKEWENIMLTKEKVKEKLQLLGTPGSWGHITEQSGTHGYLGLNSQQ 
VEYLVRKKHIYIPKNGQUSTFSCINANNINYITEGINEAVLLTESSEMCLPKEKKTLIGIK 

Li 



A search of sequence databases reveals that the NOV69b amino acid sequence has 
have 163 of 405 amino acid residues (40%) identical to, and 236 of 405 amino acid residues 
(58%) similar to, the 412 amino acid residue ptnr:SWISSNEW-ACC:P17174 protein from 
Homo sapiens (Human) (ASPARTATE AMINOTRANSFERASE, CYTOPLASMIC (EC 
2.6.1.1) (TRANSAMINASE A) (GLUTAMATE OXALOACETATE TRANSAMINASE- 1)). 
Public amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV69b is expressed in at least testis. This information was derived by determining 
the tissue sources of the sequences that were included in the invention including but not 
limited to SeqCalling sources, Public EST sources, Literature sources, and/or RACE sources. 



The disclosed NOV69b polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 69E. 
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Table 69E. BLAST results for NOV69b 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


qi|l2840318|dbj | BAB 
24820 .1 | (AK006984) 


homo log to 
ASPARTATE 
AMINOTRANSFERASE , 
CYTOPLASMIC (EC 
2.6.1.1) 
(TRANSAMINASE A) 
(GLUTAMATE 
OXALOACETATE 
TRANSAMINASE - 
1) -putative [Mus 
musculus] 


264 


185/296 
(62%) 


185/296 
(62%) 


e-102 


gi | 345752 |pir| |S290 
28 


aspartate 
transaminase (EC 
2.6.1.1) (clone 
8C7) - human 


413 


224/374 
(59%) 


224/374 
(59%) 


2e-80 


gi | 91997 |pir| | JT043 
9 


aspartate 
transaminase (EC 
2.6.1.1), 
cytosolic - rat 


413 


155/374 
(41%) 


224/374 
(59%) 


2e-80 


>gi|l05387|pir| |S13 
035 


aspartate 
transaminase (EC 
2.6.1.1) - human 


412 


155/371 
(41%) 


222/371 
(59%) 


3e-80 


gi|6754034|ref |NP 0 
34454 .l| 
(NM 010324) 


glutamate 
oxaloacetate 
transaminase 1, 
soluble ; 
cytosolic 
aspartate 
aminotransferase 
[Mus musculus] 


412 


154/369 
(41%) 


223/369 
(59%) 


4e-80 



Table 69F-G lists the domain descriptions from DOMAIN analysis results against 
NOV69b. This indicates that the NOV69b sequence has properties similar to those of other 
proteins known to contain this domain. 

5 



Table 69F. Domain Analysis of NOV69b 

gnllPfamlpfam00873 , ACRjran, AcrB/AcrD/AcrF family. Members of 
this family are integral membrane proteins. Some are involved in drug 
resistance. 

CD-Length = 1020 residues, 15.0% aligned 

Score = 51.2 bits (121), Expect = 4e-07 
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Table 69G. Domain Analysis of NOV69b 

gnllPfamlpfamQ2460 , Patched, Patched family. The transmembrane 
protein Patched is a receptor for the morphogene Sonic Hedgehog. This 
protein associates with the smoothened protein to transduce hedgehog signals. 
CD-Length = 821 residues, 31.4% aligned 

Score = 46.6 bits (109), Expect » le-05 



Concentrations of glutamate, aspartate and glycine are significantly increased in 
epileptogenic cerebral cortex. The activities of the enzymes, glutamate dehydrogenase and 
aspartate aminotransferase, involved in glutamate and aspartate metabolism are also increased. 
5 Polyamine synthesis is enhanced in epileptogenic cortex and may contribute to the 

activation of N-methyl-D-aspartate (NMD A) Receptors (See Sherwin AL (1999). Neurochem 
Res 24(1 1): 13 87-95). Nuclear magnetic resonance spectroscopy (NMRS) reveals that patients 
with poorly controlled complex partial seizures have a significant diminution in occipital lobe 
gamma aminobutyric acid (GABA) concentration. The activity of the enzyme GABA- 

10 aminotransaminase (GABA-T) which catalyzes GABA degradation is not altered in 

epileptogenic cortex. NMRS studies show that vigabatrin, a GABA-T inhibitor and effective 
antiepileptic, significantly increases brain ABA. Glutamate decarboxylase (GAD), responsible 
for GABA synthesis, is diminished in interneurons in discrete regions of epileptogenic cortex 
and hippocampus. In vivo microdialysis performed in epilepsy surgery patients provides 

15 measurements of extracellular amino acid levels during spontaneous seizures. Glutamate 

concentrations are higher in epileptic hippocampi and increase berore seizure onset reaching 
potentially excitotoxic levels. Frontal or temporal cortical epileptogenic foci also release 
aspartate, glutamate and serine particularly during intense seizures or status epilepticus. 
GABA in contrast, exhibits a delayed and feeble rise in the epileptic hippocampus possibly 

20 due to a reduction in the number and/or efficiency of GABA transporters. In additon; 

aspartate aminotransferase activity is an important index for liver function. Abnormal level 
and activity of aspartate aminotransferase correlates diseased liver conditions, e.g., hepatitis 
(See Gopal et aL, (2000). Postgrad Med 107(2):100-2, 105-9, 1 13-4; Vesely et aL, (1999). Am 
J Med Sci 317(6):419-24; Johnston DE (1999). Am Fam Phys 59(8):2223-30; Johnston SC, 

25 Pelletier LL (1 997). Medicine (Baltimore) 76(3): 1 85-91). Finally, aspartate aminotransferase 
activity is also a marker for diagnosis of cardiovascular diseases (See Wu AH (1999). Ann 
Clin Lab Sci 29(l):18-23) and periodontal disease (See Eley BM, Cox SW (1998). Br Dent J 
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184(9):427-30). Therefore, aspartate aminotransferase is an excellent small molecule target 
and diagnostic marker for epilepsis, liver diseases, cardiovascular and periodontal diseases. 

The disclosed NOV69 nucleic acid of the invention encoding a aspartate 
aminotransferase-like protein includes the nucleic acid whose sequence is provided in Table 
5 69A or 69C or a fragment thereof. The invention also includes a mutant or variant nucleic 
acid any of whose bases may be changed from the corresponding base shown in Table 69 A or 
69C while still encoding a protein that maintains its aspartate aminotransferase-like activities 
and physiological functions, or a fragment of such a nucleic acid. The invention further 
includes nucleic acids whose sequences are complementary to those just described, including 

10 nucleic acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 

1 5 chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 36 percent of the bases may be so 
changed. 

The disclosed NOV69 protein of the invention includes the aspartate 
20 aminotransferase-like protein whose sequence is provided in Table 69B. The invention also 
includes a mutant or variant protein any of whose residues may be changed from the 
corresponding residue shown in Table 69B while still encoding a protein that maintains its 
aspartate aminotransferase-like activities and physiological functions, or a functional fragment 
thereof. In the mutant or variant protein, up to about 60 percent of the residues may be so 
25 changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this aspartate 
aminotransferase-like protein (NOV69) may function as a member of a "aspartate 
30 aminotransferase family". Therefore, the NOV69 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
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prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV69 nucleic acids and proteins of the invention are useful in potential 
5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the aspartate 
aminotransferase-like protein (NOV69) may be useful in gene therapy, and the aspartate 
aminotransferase-like protein (NOV69) may be useful when administered to a subject in need 
thereof. By way of nonlimiting example, the compositions of the present invention will have 
10 efficacy for treatment of patients suffering from fertility, hypogonadism,, or other pathologies 
or conditions. The NOV69 nucleic acid encoding the aspartate aminotransferase-like protein 
of the invention, or fragments thereof, may further be useful in diagnostic applications, 
wherein the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV69 nucleic acids and polypeptides are further useful in the generation of 
1 5 antibodies that bind immuno-specifically to the novel NOV69 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV69 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
20 assay systems for functional analysis of various human disorders, which will help in 

understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV70 

A disclosed NOV70 nucleic acid of 4915 nucleotides (also referred to as CG-57462- 
25 01) encoding a KIAA1337-like protein is shown in Table 70A. The start and stop codons are 
in bold letters. 



Table 70A. NOV70 nucleotide sequence (SEQ ID NO: 167). 

AATATGCCTGCCATA AAGGGGAACCGGTGGTGGCAGTGGAGCTGGTGCAGGTGTGGAAAG 
TCATGAGAATCCTCCTCCTCGGGTTCATCCCCAGGTTTCCTCCCCTTCCCCTGCCCGCGC 
CTGCTTGCAGGAAGCGTGTTTACTACAGGGGCCTGGGGATGCTGGCCTCACACATCCCTG 
CCCAGCCACAGGGTACTTCCCTGAAGCCTCCTGTACCTTCAGCCCCATCCTCGATTCTCG 
CCTCCGGCTCCTCCTCCCCCCACGCCCTCCGGAATGAGCCCCGTACCCCCACCCTCACGC 
GCCCTCGCTGTATAAACGCCCTCACTTGTACTGCAAGTCCCTGCGGTCCCACCTTCAGGC 
TTCAGCATTCGCTCGACGCATCCCCCAGGCCTGCTTGCCTTGTCACTGTAGCGCCAGACC 
CAGCTTCCTTTGCGGCTCCGAGAAGCTTCCCCCTGCGACTTCCGCGAGGAGACGAGTCTG 
CGCAGCGTGGTGGCCGCCGCCCCCCGACCCTCTGCGCACTCTCTCCCGCGCCGGCGGCTC 
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AGCCTAGCCCCGTTCGGCCGGCCGAGACTATGGACACGGAGGATGACCCCTTGCTGCAGG 
ATGTGTGGCTAGAGGAGGAGCAGGAGGAGGAAGAAGCAACGGGTGAAACCTTTTTAGGGG 
CCCAGAAGCCAGGGCCCCAACCTGGGGCAGGGGGACAGTGTTGCTGGCGGCACTGGCCCC 
TGGCTTCCCGACCCCCAGCTTCGGGCTTCTGGAGTACCCTGGGCTGGGCCTTCACCAATC 
CGTGCTGTGCTGGGCTGGTGCTCTTCCTGGGCTGCAGCATCCCCATGGCCCTGTCAGCCT 
TCATGTTCCTTTACTACCCACCGCTGGACATTGACATCTCCTACAACGCCTTTGAGATCC 
GCAACCACGAGGCCTCACAGCGTTTCGACGCTCTCACTCTGGCGCTTAAGTCCCAGTTTG 
GATCCTGGGGGCGGAACCGGCGCGATTTGGCCGACTTCACCTCCGAGACGCTTCAGCGCC 
TTATCTCAGAGCAGCTGCAGCAGCTGCATCTCGGCAACCGCTCGCGGCAAGCCTCCCGAG 
CCCCCCGCGTCATCCCCGCGGCCTCACTCGGTAGCCCAGGCCCTTACCGGGACACTTCCG 
CGGCTCAAAAGCCCACAGCCAATCGGAGCGGGCGACTTCGGCGTGAGACCCCGCCCCTGG 
AGGATCTGGCAGCCAACCAGAGTGAAGACCCGCGAAACCAGCGGCTGAGCAAGAATGGGC 
GGTACCAGCCCAGCATCCCGCCCCACGCGGCAGTCGCGGCCAATCAGAGCCGTGCCCGCC 
GAGGCGCCTCGCGCTGGGACTACTCGCGCGCCTATGTGAGTGCCAACACTCAGACGCACG 
CGCACTGGCGCATCGAGCTCATCTTCCTGGCGCGCGGCGACGCGGAGCGCAACATTTTCA 
CCAGTGAGCGCCTGGTCACGATCCATGAGATCGAGCGCAAGATCATGGACCACCCAGGCT 
TCCGGGAGTTCTGCTGGAAGCCCCACGAGGTGCTCAAGGATCTGCCGCTGGGCTCCTACT 
CCTACTGCTCGCCCCCCAGCTCGCTCATGACCTACTTTTTTCCCACCGAGAGGGGCGGCA 
AGATCTACTATGACGGCATGGGCCAGGACCTGGCGGACATCCGGGGCTCCCTGGAGCTGG 
CCATGACTCACCCTGAGTTCTACTGGTATGTGGATGAGGGCCTCTCTGCAGACAATCTGA 
AGAGCTCCCTCCTGCGCAGTGAGATCCTGTTTGGAGCACCCCTGCCCAACTACTACTCAG 
TAGATGACCGCTGGGAGGAACAACGGGCTAAGTTTCAGAGCTTCGTGGTCACCTACGTGG 
CCATGCTGGCCAAGCAGTCTACCAGCAAAGTCCAGGTTCTCTATGGGGGGACAGACCTGT 
TTGACTATGAAGTGCGCAGGACGTTCAACAATGACATGCTCCTGGCCTTCATCAGCAGCA 
GCTGCATTGCTGCCCTGGTCTACATCCTCACCTCCTGCTCAGTGTTCCTGTCCTTCTTTG 
GGATTGCCAGCATTGGTCTCAGCTGCCTGGTGGCCCTCTTCCTGTACCACGTGGTCTTTG 
GTATCCAGTACTTGGGCATCCTGAATGGGGTGGCCGCCTTCGTGATCGTGGGCATTGGTG 
TGGACGATGTCTTTGTGTTCATCAACACCTACCGCCAGGCCACCCACCTGGAAGACCCAC 
AGCTGCGCATGATCCACACCGTCCAAACTGCAGGCAAGGCCACCTTCTTCACCTCCCTGA 
CCACAGCCGCCGCCTACGCAGCTAACGTCTTCTCCCAGATCCCAGCCGTCCACGACTTTG 
GCCTGTTCATGTCTCTCATCGTGTCCTGTTGCTGGCTGGCCGTGCTTGTCACCATGCCTG 
CAGCTCTGGGCCTCTGGAGCCTCTACCTGGCACCACTGGAGAGCTCCTGCCAGACCAGCT 
GCCACCAGAATTGCAGCCGGAAGACCTCCCTGCACTTCCCCGGAGACGTGTTTGCCACTC 
CCGAGCAGGTTGGAGGCAGCCCTGCCCAGGCCCCCATACCCTACCTGGATGATGACATCC 
CCTTGCTGGAGGTCGAGGAAGAGCCAGTGTCACTGGAGCTGGGAGACGTGTCCCTGGTGT 
CTGTGTCCCCCGAGGGTCTGCAGCCAGCCTCCAACACGGGCAGCCGCGGCCATCTCATCG 
TGCAGCTGCAGGAGCTGCTGCACCACTGGGTCCTGTGGTCAGCCGTCAAGAGCCGCTGGG 
TGATTGTGGGGCTGTTCGTCTCCATCCTCATCTTGTCCCTGGTGTTCGCCAGCCGGCTCC 
GCCCCGCCAGCCGGGCCCCGCTACTCTTCCGGCCTGATACCAACATCCAGGTGCTGCTGG 
ACCTCAAGTACAACCTGAGCGCCGAGGGCATCTCCTGCATCACCTGTTCAGGTCTGTTCC 
AGGAGAAGCCCCACAGCCTGCAGAACAACATCCGGACGTCCCTGGAGAAGAAGAGGCGAG 
GCTCAGGGGTCCCCTGGGCTAGCCGGCCTGAGGCCACCCTGCAGGATTTCCCAGGCACCG 
TGTACATCTCTAAAGTGAAGAGTCAAGGCCACCCCGCTGTCTACAGGCTCTCCCTCAATG 
CCAGCCTGCCTGCTCCTTGGCAGGCTGTGTCGCCTGGGGATGGAGAGGTGCCCTCCTTCC 
AGGTGTATAGAGCGCCTTTTGGTAACTTCACCAAGAAGCTGACCGCTTGTATGTCTACAG 
TAGGGCTGCTCCAGGCGGCGAGCCCCTCCCGCAAGTGGATGCTGACGACCTTGGCCTGTG 
ATGCCAAGCGGGGCTGGAAGTTTGACTTCAGCTTCTACGTGGCCACCAAGGAGCAGCAGC 
ACACCCGGAAGCTGTACTTCGCCCAGTCCCACAAGCCCCCCTTCCACGGGCGCGTATGCA 
TGGCACCCCCTGGCTGCCTGCTTAGCTCCAGCCCCGATGGGCCTACCAAAGGCTTCTTCT 
TCGTGCCTAGTGAGAAAGTGCCCAAGGCCCGTCTCTCAGCCACCTTCGGCTTCAACCCCT 
GCGTGAACACGGGCTGCGGGAAGCCGGCGGTGCGGCCACTAGTGGATACCGGGGCCATGG 
TCTTTGTGGTCTTCGGCATTATTGGCGTCAACCGCACTCGGCAGGTGGACAACCACGTCA 
TTGGAGACCCGGGTAGTGTTGTCTACGACAGCAGCTTTGACCTCTTCAAGGAAATTGGGC 
ACCTGTGTCACCTCTGCAAGGCCATCGCAGCCAACTCCGAGCTGGTGAAGCCGGGTGGGG 
CCCAGTGCCTGCCTTCAGGCTACAGCATCTCCTCCTTCCTGCAGATGTTGCACCCTGAGT 
GCAAGGAGCTGCCCGAGCCCAACCTGCTCCCGGGGCAGCTGTCCCACGGGGCAGTGGGCG 
TCAGGGAGGGCCGCGTGCAGGAGATCTCCATGGCTTTCGAGTCGACCACGTACAAGGGCA 
AATCCTCCTTCCAGACCTACTCGGACTACCTGCGCTGGGAGAGCTTCCTCCAGCAGCAGC 
TGCAGGCCTTGCCCGAGGGCTCAGTCCTGCGCCGGGGCTTCCAGACCTGCGAGCACTGGA 
AGCAGATATTCATGGAAATCGTAGGGGTGCAGAGCGCCCTGTGCGGCCTGGTGCTATCCC 
TGCTCATCTGCGTGGCCGCGGTGGCCGTGTTCACCACCCACATCCTGCTCCTGCTGCCCG 
TGCTCCTCAGCATCTTGGGCATCGTGTGCCTGGTGGTGACCATCATGTACTGGAGCGGCT 



391 



iru mt c» ra- «p «■ ssn tra- s 
yus> UlUP *«2» la* 1 ^» sa •a«a> «w iw .**«a> tu-^p ik*. 



GGGAGATGGGGGCTGTGGAAGCCATCTCCCTGTCCATCCTCGTTGGCTCCTCCGTGGATT 
ACTGCGTCCACCTGGTCGAGGGCTACCTGCTGGCTGGAGAGAACCTGCCCCCCCACCAGG 
CCGAGGACGCCCGAACGCAGCGCCAGTGGCGTACGCTGGAGGCCGTGCGGCACGTGGGCG 
TGGCCATCGTCTCCAGTGCCCTCACCACGGTCATCGCCACAGTGCCCCTCTTCTTCTGCA 
TCATCGCCCCATTTGCCAAGTTCGGCAAGATTGTGGCACTCAACACGGGCGTGTCCATCC 
TCTACACGCTGACCGTCAGCACCGCCCTGCTGGGCATCATGGCGCCCAGCTCTTTCACTC 
GGACCCGGACTTCCTTCCTCAAGGCCCTGGGTGCCGTGCTGCTGGCAGGGGCCCTGGGGC 
TGGGTGCCTGCCTCGTGCTCCTGCAGAGCGGCTATAAGATTCCCCTGCCCGCAGGGGCCT 
CCCTATA GCCCGGGACGGGCTCTGGACACTTGCACCTTTGGTCCCATGGGTGGGGGACAG 
GAGCTGCTTCCCAGCTCGACTTCAGCTAGCTGTGTCCCC AGGCCT GGGCCCAGGGCGCCC 
TGCGGGCCAGCGTGGAGGCTGACACCCACACAGATGGTGTGGACCATGCTGCCTT 



In a search of public sequence databases, the NOV70 nucleic acid sequence, located on 
chromsome 1 has 4481 of 4484 bases (99%) identical to a gb:GENBANK- 
ID:AB037758|acc:AB037758.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
KIAA1337 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV70 polypeptide (SEQ ID NO: 168) encoded by SEQ ID NO: 167 
has 1561 amino acid residues and is presented in Table 70B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV70 has a signal peptide and 
is likely to be localized plasma membrane with a certainty of 0.8000. The most likely cleavage 
site for a NOV70 peptide is between amino acids 1 8 and 19. 



Table B. Encoded NOV70 protein sequence (SEQ ID NO: 168). 



MRILLLGFIPRFPPLPLPAPACRKRVYYRGLGMLASHIPAQPQGTSLKPPVPSAPSSILA 
SGSSSPHALRNEPRTPTLTRPRCINALTCTASPCGPTFRLQHSLDASPRPACLVTVAPDP 
ASFAAPRSFPLRLPRGDESAQRGGRRPPTLCALSPAPAAQPSPVRPAETMDTEDDPLLQD 
VWLEEEQEEEEATGETFLGAQKPGPQPGAGGQCCWRHWPLASRPPASGFWSTLGWAFTNP 
CCAGLVLFLGCSIPRAXSAFMFLYYPPLDIDISYNAFEIRNHEASQRFDALTLALKSQFG 
SWGRNRRDLADFTSETLQRLISEQLQQLHLGNRSRQASRAPRVIPAASLGSPGPYRDTSA 
AQKPTANRSGRLRRETPPLEDLAANQSEDPRNQRLSKNGRYQPS I PPHAAVAANQSRARR 
GASRWDYSRAYVSANTQTHAHWRIELIFLARGDAERNIFTSERLVTIHEIERKIMDHPGF 
REFCWKPHEVLKDLPLGSYSYCSPPSSLMTYFFPTERGGKIYYDGMGQDLADIRGSLELA 
MTHPEFYWYVDEGLSADNLKSSLLRSEILFGAPLPNYYSVDDRWEEQRAKFQSFWTYVA 
MLAKQS TS KVQVL YGGTDL FD YE VRRTFNNDMLLAFI SSS C I AALVY I LTS CS VFLS FFG 
IAS I GLS CL VAL FL YHWFG I Q YLG I LNG VAAF V I VG I G VDD VF V F I NT YR QATHL ED PQ 
LRMIHTVQTAGKATFFTSLTTAAAYAANVFSQIPAVHDFGLFMSLIVSCCWLAVLVTMPA 
ALGL W S LYLAP LESS CQTS CHQNCS RKT S LH F PGD VFAT P EQVGGS P AQAP I P YLDDD I P 
LLEVEEEPVSLELGDVSLVSVSPEGLQPASNTGSRGHLIVQLQELLHHWVLWSAVKSRWV 
IVGLFVSILILSLVFASRLRPASRAPLLFRPDTNIQVLLDLKYNLSAEGISCITCSGLFQ 
EKPHSLQNNIRTSLEKKRRGSGVPWASRPEATLQDFPGTVYISKVKSQGHPAVYRLSLNA 
SLPAPWQAVSPGDGEVPSFQVYRAPFGNFTKKLTACMSTVGLLQAASPSRKWMLTTLACD 
AKRGWKFDFSFYVATKEQQHTRKLYFAQSHKPPFHGRVCMAPPGCLLSSSPDGPTKGFFF 
VPSEKVPKARLSATFGFNPCVNTGCGKPAVRPLVDTGAMVFWFGIIGVNRTRQVDNHVI 
GDPGSVVYDSSFDLFKEIGHLCHLCKAIAANSELVKPGGAQCLPSGYSISSFLQMLHPEC 
KELPEPNLLPGQLSHGAVGVREGRVQEISMAFESTTYKGKSSFQTYSDYLRWESFLQQQL 
QALPEGSVLRRGFQTCEHWKQIFMEIVGVQSALCGLVLSLLICVAAVAVFTTHILLLLPV 
LLSILGIVCLWTIMYWSGWEMGAVEAISLSILVGSSVDYCVHLVEGYLLAGENLPPHQA 
EDARTQRQWRTLEAVRHVGVAI VS S ALTTV I ATVPLFFC I I AP FAKFGK I VALNTGVS I L 
YTLTVSTALLG I MAP S S FTRTRTS FLKALGAVLLAGALGLGACLVLLQ SG YK I PL P AGAS 
L 
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A search of sequence databases reveals that the NOV70 amino acid sequence has 1436 
of 1438 amino acid residues (99%) identical to, and 1436 of 1438 amino acid residues (99%) 
similar to, the 1438 amino acid residue ptnr:SPTREMBL-ACC:Q9P2K9 protein from Homo 
sapiens (Human) (KIAA1337 PROTEIN). Public amino acid databases include the GenBank 
5 databases, SwissProt, PDB and P1R. 

NOV70 is expressed in at least Brain, Cerebral Medulla/Cerebral white matter, 
hippocampus, Hypothalamus, Left cerebellum, Lung, Parietal Lobe, Testis, and Right 
Cerebellum. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources, Public 
10 EST sources, Literature sources, and/or RACE sources. 

The disclosed NOV70 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 70C. 



Table 70C. BLAST results for NOV70 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi| 7243055 | dbj | BAA9 


KIAA1337 protein 
[Homo sapiens] 


1438 


1436/1438 
(99%) 


1436/1438 
(99%) 


0.0 


2575. l| (AB037758) 


qi|l7448847|ref | XP 
052561.4 | 
(XM_052 561) 


KIAA1337 protein 
[Homo sapiens] 


1392 


1388/1392 
(99%) 


1388/1392 
(99%) 


0.0 


gi | 5834578 | emb | CABS 


hypothetical 
protein [Homo 
sapiens] 


594 


592/594 
(99%) 


592/594 
(99%) 


0.0 


5303. 1| (AL117236) 


gi|l7448809|ref |XP 


Similar to 
KIAA1337 protein 
[Homo sapiens] 


383 


361/363 
(99%) 


361/363 
(99%) 


0.0 


052559.3 | 
{XM 052559) 


gi|l8545186|ref |XP 


protein MGC13130 
[Homo sapiens] 


1524 


54/177 
(30%) 


89/177 
(49%) 


5e-14 


046122.2| 
(XM 046122) 



15 Tables 70D-E list the domain descriptions from DOMAIN analysis results against 

NOV70 . This indicates that the NOV70 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 70D. Domain Analysis of NOV70 

gnllPfamlpfam00873 , ACRjran, AcrB/AcrD/AcrF family. Members of 
this family are integral membrane proteins. Some are involved in drug 
resistance. 

CD-Length = 1020 residues, 15.0% aligned 
Score = 51.2 bits (121), Expect = 4e-07 



Table 70E. Domain Analysis of NOV70 

gnl[Pfam|pfam02460 , Patched, Patched family. The transmembrane 
protein Patched is a receptor for the morphogene Sonic Hedgehog. This 
protein associates with the smoothened protein to transduce hedgehog signals. 
CD-Length = 821 residues, 31.4% aligned 

Score = 46.6 bits (109) , Expect = le-05 



The Drosophila 'patched' (ptc) gene encodes a transmembrane protein that represses 
transcription in specific cells of genes encoding members of the TGF beta (see 190180) and 
5 Wnt (1 64820) families of signaling proteins. Vertebrate homologs of ptc have been identified 
in mice, chickens, and zebra fish. The human PTC protein is predicted to contain 12 
hydrophobic membrane-spanning domains and 2 large hydrophilic extracellular loops. 
Johnson et al. (1996) mapped the human PTC gene to chromosome 9q22.3 by radiation hybrid 
analysis. 

10 The murine PTCH homolog, Ptc, maps to mouse chromosome 13. Ptc maps close to 

the murine Face locus (0 recombinants in 188 meioses). They noted that mouse mutations such 
as flexed tail (f), purkinje cell degeneration (ped), and mesenchymal dysplasia (mes) which 
involve abnormal development of skeletal and neural tissues are also located in this region of 
chromosome 13 and may be allelic to Ptc. The 2 extracellular loops of the Ptc protein are 

1 5 necessary for binding and that binding also requires that the Ptc protein be glycosylated. 

Marigo et al. (1996) proposed that Ptc does not carry out signaling to the cell directly but that 
an additional molecule is involved, namely the 7-transmembrane protein 'Smoothened * (SMO; 
601500). The Ptc gene encodes a candidate receptor for Shh by showing that epitope-tagged 
N-Shh binds specifically to human embryonic kidney 293 cells expressing mouse Ptc. 
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Basal cell carcinoma, medulloblastoma, rhabdomyosarcoma, and other human tumors 
are associated with mutations that activate the protooncogene 'Smoothened' or that inactivate 
the tumor suppressor 'Patched.' Smoothened and Patched mediate the cellular response to the 
Hedgehog secreted protein signal, and oncogenic mutations affecting these proteins cause 
5 excess activity of the Hedgehog response pathway. 

Approximately 5% of patients with Gorlin syndrome develop medulloblastoma in the 
first few years of life, and 10% of patients with medulloblastoma diagnosed at age 2 years or 
under have Gorlin syndrome. Cowan et al. (1997) found that 1 out of 3 unrelated patients with 
medulloblastoma complicated by Gorlin syndrome had lost the wildtype allele on 9q, 
10 indicating that the Gorlin locus probably acts a tumor suppressor in the development of this 
tumor. They also confirmed this role in a basal cell carcinoma from the same individual. 
Studying patients who presented with multiple odontogenic keratocysts, Lench et al. (1997) 
identified 5 novel germline mutations in PTCH. Four mutations caused premature stop codons 
and 1 resulted in an amino acid substitution toward the C terminus of the predicted protein. 

15 The disclosed NOV70 nucleic acid of the invention encoding a KIAA1337-like 

protein includes the nucleic acid whose sequence is provided in Table 70A or a fragment 
thereof. The invention also includes a mutant or variant nucleic acid any of whose bases may 
be changed from the corresponding base shown in Table 70A while still encoding a protein 
that maintains its KIAA1337-like activities and physiological functions, or a fragment of such 

20 a nucleic acid. The invention further includes nucleic acids whose sequences are 
complementary to those just described, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of nonlimiting example, 

25 modified bases, and nucleic acids whose sugar phosphate backbones are modified or 

derivatized. These modifications are carried out at least in part to enhance the chemical 
stability of the modified nucleic acid, such that they may be used, for example, as antisense 
binding nucleic acids in therapeutic applications in a subject. In the mutant or variant nucleic 
acids, and their complements, up to about 1 percent of the bases may be so changed. 

30 The disclosed NOV70 protein of the invention includes the KIAA1337-like protein 

whose sequence is provided in Table 70B. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residue shown in Table 
70B while still encoding a protein that maintains its KIAA1337-like activities and 
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physiological functions, or a functional fragment thereof. In the mutant or variant protein, up 
to about 1 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 
5 The above defined information for this invention suggests that this KIAA1 337-like 

protein (NOV70) may function as a member of a "KIAA1337 family". Therefore, the NOV70 
nucleic acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 

10 therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV70 nucleic acids and proteins of the invention are useful in potential 

1 5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the KIAA1 337-like protein 
(NOV70) may be useful in gene therapy, and the KIAA1 337-like protein (NOV70) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 

20 from Von Hippel-Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous sclerosis, 
hypercalceimia, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy, Lesch- 
Nyhan syndrome, Multiple sclerosis, Ataxia-telangiectasia, Leukodystrophies, Behavioral 
disorders, Addiction, Anxiety, Pain, Neuroprotection, Endocrine dysfunctions, Diabetes, 
obesity, Growth, Systemic lupus erythematosus s Autoimmune disease, Asthma, Emphysema, 

25 Scleroderma, allergy, Fertility, ARDS, Pharyngitis, Laryngitis, Myasthenia gravis,, or other 
pathologies or conditions. The NOV70 nucleic acid encoding the KIAA1337-like protein of 
the invention, or fragments thereof, may further be useful in diagnostic applications, wherein 
the presence or amount of the nucleic acid or the protein are to be assessed. 

NOV70 nucleic acids and polypeptides are further useful in the generation of 
30 antibodies that bind immuno-specifically to the novel NOV70 substances for use in 

therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV70 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
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assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV71 

A disclosed NOV71 nucleic acid of 2004 nucleotides (also referred to as CG57584-01) 
encoding a zona pellucida glycoprotein 1 precursor-like protein is shown in Table 71 A. The 
start and stop codons are in bold letters. 



Table 71A. NOV71 nucleotide sequence (SEQ ID NO:169). 



CACCCACTTGCCAC CTCCTCCATAAAAGGCCTGGGCCCAGCTCTGGTGGCGA GGGAGTAG 
GGGGTGTGTCTGTGGCGTCTC ATGGCAGGAGGCTCAGCCACGACCTGGGGTTACCCTGTG 
GCCCTGCTACTGCTGGTCGCCACCCTGGGGCTGGGTAGCTACGACTGTGGGATCAAGGGA 
ATGCAGCTGCTGGTGTTCCCCAGGCCAGGCCAGACTCTCCCCTTCAAGGTGGTGGATGAA 
TTTGGGAACCGATTTGATGTCAACAACTGCTCCATCTGCTACCACTGGGTCACCTCCAGG 
CCGCAGGAGCCTGCAGTCTTCTCGGCCGATTACAGAGGCTGCCACGTGCTGGAGAAGGAT 
GGGCGTTTCCACCTGAGGGTGTTCATGGAGGCTGTGCTGCCCAATGGTCGTGTGGATGTG 
GCACAAGACGCTACTCTGATCTGTCCCAAACCTGACCCCTCCCGGACTCTGGACTCCCAG 
CTGGCACCACCCGCCATGTTCTCTGTCTCAACCCCACAAACCCTTTCCTTCCTCCCCACC 
TCTGGCCATACCTCCCAAGGCTCTGGCCATGCCTTTCCCAGCCCACTGGACCCAGGGCAC 
AGCTCTGTCCACCCAACCCCTGCTTTACCATCCCCTGGACCTGGACCTACCCTCGCCACC 
CTGGCTCAACCCCACTGGGGCACCTTGGAACACTGGGATGTGAACAAACGAGATTACATA 
GGTACCCACCTGAGCCAGGAGCAGTGCCAGGTGGCCTCAGGGCACCfCCCCTGCATCGTG 
AGAAGAACTTCAAAAGAAGCCTGTCAGCAGGCTGGCTGCTGCTATGACAACACCAGAGAG 
GTTCCCTGTTACTATGGCAACACAGCTACTGTCCAGTGCTTCAGAGATGGCTACTTCGTC 
CTCGTAGTGTCCCAAGAAATGGCCTTGACACACAGGATCACACTGGCCAACATCCACCTG 
GCCTATGCCCCCACCAGCTGCTCCCCAACACAGCACACGGAAGCTTTCGTGGTCTTCTAC 
TTCCCTCTCACCCACTGTGGAACCACAATGCAGGTGGCTGGCGACCAGCTCATCTATGAG 
AACTGGCTGGTGTCTGGCATCCACATCCAAAAGGGGCCACAGGGTTCCATCACGCGGGAC 
AGCACCTTCCAGCTTCATGTGCGCTGTGTCTTCAACGCCAGTGACTTCCTGCCCATTCAG 
GCATCCATTTTCCCACCCCCATCGCCTGCTCCTATGACCCAGCCCGGCCCCCTGCGGCTT 
GAGCTGCGGATTGCCAAAGACGAGACCTGCAGCTCGTACTATGGGGAGGATGACTATCCC 
ATCGTGAGGCTGCTCCGAGAACCAGTCCATGTGGAGGTCCGGCTTCTGCAGAGGACAGAC 
CCCAACCTGGTCCTGCTGCTGCACCAGTGCTGGGGCGCTCCCAGTGCCAACCCCTTCCAG 
CAGCCCCAGTGGCCCATCCTGTCAGACGGATGCCCCTTCAAGGGCGACAGCTACAGAACG 
CAAATGGTAGCCTTGGACGGGGCCACACCTTTCCAGTCGCACTACCAGCGATTCACTGTT 
GCTACCTTCGCCCTCCTGGACTCAGGCTCCCAGAGAGCCCTCAGAGGACTGGTTTACTTG 
TTCTGCAGCACCTCTGCCTGCCACACCTCAGGGCTGGAGACTTGCTCCACTGCATGTAGC 
ACTGGCACTACAAGACAGCGACGATCCTCAGGTCACCGTAATGACACTGCCAGGCCCCAG 
GACATCGTGAGCTCTCCGGGGCCAGTGGGCTTTGAGGATTCTTATGGGCAGGAGCCCACA 
CTTGGGCCCACAGACTCCAATGGGAACTCCAGCCTGAGACCTCTCCTTTGGGCGGTCCTT 
TTGCTGCCAGCTGTTGCCCTGGTCCTTGGGTTTGGTGTCTTTGTGGGCCTGAGCCAGACC 
TGGGCCCAGAAGCTCTGGGAAAGCAACAGACAGTG AATGGGCCCAATAAACAATCATTTC 
CAACCTACTGAAAAAAAAAAAAAA 



In a search of public sequence databases, the NOV71 nucleic acid sequence, located on 
chromsome 1 1 has 1305 of 1704 bases (76%) identical to a gb:GENBANK- 
ID:MOZPl|acc:U20448.1 mRNA from Mus musculus (Mus musculus ZP1 precursor (Zp-1) 
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mRNA, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed N0V71 polypeptide (SEQ ID NO:170) encoded by SEQ ID NO:169 has 
624 amino acid residues and is presented in Table 71B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV71 has a signal peptide and is 
likely to be localized localized at the plasma membrane with a certainty of 0.4600. The most 
likely cleavage site for aNOV71 peptide is between amino acids 25 and 26. 

Table 71B. Encoded NOV71 protein sequence (SEQ ID NO:170). 

MAGGSATTWGYPVALLLLVATLGLGSYDCGIKGMQLLVFPRPGQTLPFKVVDEFGNRFDV 
NNCSICYHWTSRPQEPAVFSADYRGCHVLEKDGRFHLRVFMEAVLPNGRVDVAQPATLI 
CPKPDPSRTLDSQLAPPAMFSVSTPQTLSFLPTSGHTSQGSGHAFPSPLDPGHSSVHPTP 
ALPSPGPGPTLATLAQPHWGTLEHWDVNKRDYIGTHLSQEQCQVASGHLPCIVRRTSKEA 
CQQAGCCYDNTREVPCYYGNTATVQCFRDGYFVLWSQEMALTHRITKANIHLAYAPTSC 
S P TQHTE AFWF YFPLTHCGTTMQ VAGDQL I YENWLVS G I H I QKGPQG S I TRD S T FQLHV 
RCVFNASDFLP IQAS I F P P P S P APMTQPGPLRLE LRI AKDETC S S YYGEDDYP I VRLLRE 
PVHVEVRLLQRTDPNLVLLLHQCWGAPSANPFQQPQWPILSDGCPFKGDSYRTQMVALDG 
ATPFQSHYQRFTVATFALLDSGSQRALRGLVYLFCSTSACHTSGLETCSTACSTGTTRQR 
RSSGHRNDTARPQDIVSSPGPVGFEDSYGQEPTLGPTDSNGNSSLRPLLWAVLLLPAVAL 
VLGFGVFVGLSQTWAQKLWESNRQ 

A search of sequence databases reveals that the NOV71 amino acid sequence has 422 
of 620 amino acid residues (68%) identical to, and 477 of 620 amino acid residues (76%) 
similar to, the 623 amino acid residue ptnr:SPTREMBL-ACC:Q62016 protein from Mus 
musculus (Mouse) (ZONA PELLUCIDA GLYCOPROTEIN 1 PRECURSOR (ZP1)). Public 
amino acid databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV71 is expressed in at least Kidney, Pituitary Gland, Testis, Whole Organism. 
Expression information was derived from the tissue sources of the sequences that were 
included in the derivation of the sequence of CG57584-01.The sequence is predicted to be 
expressed in the following tissues because of the expression pattern of (GENBANK-ID: 
gb:GENBANK-ID:MOZPl|acc:U20448.1) a closely related Mus musculus ZP1 precursor (Zp- 
1) mRNA, complete cds homolog in species Mus musculus :ovary. This information was 
derived by determining the tissue sources of the sequences that were included in the invention 
including but not limited to SeqCalling sources, Public EST sources, Literature sources, and/or 
RACE sources. 

The disclosed NOV71 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 71C. 
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Table 71C. BLAST results for NOV71 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|!7460872|ref | XP 
062198. 1| 
(XM_062198) 


similar to zona 
pellucida 
glycoprotein 1 
(H. sapiens) 


496 


458/458 
(100%) 


458/458 
(100%) 


0.0 


gi|6677653 |ref | NP 0 
33606. l| 
(NM_009580) 


zona pellucida 
glycoprotein 1 
[Mus musculus] 


623 


416/628 
(66%) 


471/628 
(74%) 


0.0 


gi|H13794|gb|AAB60 
507. 1| (U24230) 


zona pellucida 
[Mus musculus] 


623 


414/628 
(65%) 


470/628 
(73%) 


0.0 


gi ] 1675824 0 | ref |NP 
445961.lt 
(NM_053509 


zona pellucida 
glycoprotein 1 
[Rattus 
norvegicusj 


617 


416/621 
(66%) 


469/621 
(74%) 


0 . 0 


gi | 1114 0012 | emb | CAC 
16087.1] (AJ289697) 


zona 
pellucida protein 1 
[Gallus gallus] 


934 


197/369 
(53%) 


254/369 
(68%) 


2E-97 



Tables 71D-E list the domain descriptions from DOMAIN analysis results against 
NOV71 . This indicates that the NOV71 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 71D. Domain Analysis of NOV71 

gnljPfam[pfamOO 1 00 , zona_pellucida, Zona pellucida-like domain. 
CD-Length = 266 residues, 99.6% aligned 

Score = 209 bits (533), Expect = 3e-55 



Table 71E. Domain Analysis of NOV71 

gnl 1 Sm art [ smart 00 01 8 , P, P or trefoil or TFF domain; Proposed role in 
renewal and pathology of mucous epithelia 

CD-Length = 47 residues, 100.0% aligned 

Score = 39.7 bits (91), Expect = 5e-04 



The mammalian zona pellucida is composed of 3 major glycoproteins, ZP1, ZP2 
(182888), and ZP3 (182889). ZP3, the molecule responsible for the major sperm-receptor 
10 activity of the zona, plays a significant role in fertilization. ZP2 is implicated as a secondary 
sperm receptor that binds sperm only after the induction of the sperm acrosome reaction. See 
review by Dean (1992). The mature ZP1, ZP2, and ZP3 proteins have molecular weights of 
90-1 10 kD, 64-76 kD, and 57-73 kD, respectively. 
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The disclosed NOV71 nucleic acid of the invention encoding a zona pellucida 
glycoprotein 1 precursor-like protein includes the nucleic acid whose sequence is provided in 
Table 71 A or a fragment thereof. The invention also includes a mutant or variant nucleic acid 
any of whose bases may be changed from the corresponding base shown in Table 71 A while 
5 still encoding a protein that maintains its zona pellucida glycoprotein 1 precursor-like 

activities and physiological functions, or a fragment of such a nucleic acid. The invention 
further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

10 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

15 In the mutant or variant nucleic acids, and their complements, up to about 24 percent of the 
bases may be so changed. 

The disclosed NOV71 protein of the invention includes the zona pellucida glycoprotein 
1 precursor-like protein whose sequence is provided in Table 71 B. The invention also 
includes a mutant or variant protein any of whose residues may be changed from the 

20 corresponding residue shown in Table 71B while still encoding a protein that maintains its 
zona pellucida glycoprotein 1 precursor-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 32 percent of the 
residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

25 (F a b)2 5 that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this zona pellucida 
glycoprotein 1 precursor-like protein (NOV71) may function as a member of a "zona pellucida 
glycoprotein 1 precursor family". Therefore, the NOV71 nucleic acids and proteins identified 
here may be useful in potential therapeutic applications implicated in (but not limited to) 

30 various pathologies and disorders as indicated below. The potential therapeutic applications 
for this invention include, but are not limited to: protein therapeutic, small molecule drug 
target, antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), diagnostic 
and/or prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 



400 



«& « m *m n 5 * «s* ira-:3a rasa n rsa 

.JL, 1iu3» »J^> ^2Su ^ M « 5-^ 3La« .4*^ 



regeneration w v/vo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 

The NOV71 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
5 and disorders as indicated below. For example, a cDNA encoding the zona pellucida 

glycoprotein 1 precursor- like protein (NOV71) may be useful in gene therapy, and the zona 
pellucida glycoprotein 1 precursor-like protein (NOV71) may be useful when administered to 
a subject in need thereof. By way of nonlimiting example, the compositions of the present 
invention will have efficacy for treatment of patients suffering from Cardiomyopathy, 

1 0 Atherosclerosis, Hypertension, Congenital heart defects, Aortic stenosis , Atrial septal defect 
(ASD), Atrioventricular (A-V) canal defect, Ductus arteriosus , Pulmonary stenosis , Subaortic 
stenosis, Ventricular septal defect (VSD), valve diseases, Tuberous sclerosis, Scleroderma, 
Obesity, Transplantation, Aneurysm, Fibromuscular dysplasia, Stroke, Anemia , Bleeding 
disorders, Adrenoleukodystrophy , Congenital Adrenal Hyperplasia, Diabetes, Von Hippel- 

15 Lindau (VHL) syndrome , Pancreatitis, Hyperparathyroidism, Hypoparathyroidism, SIDS, 

Endometriosis, Fertility, Xerostomia, Hypercalceimia, Ulcers, Cirrhosis, Inflammatory bowel 
disease, Diverticular disease, Hirschsprung's disease , Crohn's Disease, Appendicitis, 
Hemophilia, hypercoagulation, Idiopathic thrombocytopenic purpura, autoimmume disease, 
allergies, immunodeficiencies, Graft vesus host, Ataxia-telangiectasia, Hemophilia, 

20 Lymphedema, Tonsilitis, Osteoporosis, Arthritis, Ankylosing spondylitis, Scoliosis, 

Tendinitis, Muscular dystrophy, Lesch-Nyhan syndrome, Myasthenia gravis, Dental disease 
and infection, Alzheimer's disease, Tuberous sclerosis, Parkinson's disease, Huntington's 
disease, Cerebral palsy, Epilepsy, Lesch-Nyhan syndrome, Multiple sclerosis, Ataxia- 
telangiectasia, Leukodystrophies, Behavioral disorders, Addiction, Anxiety, Pain, 

25 Neuroprotection, Growth and reproductive disorders, Endocrine dysfunctions, Systemic lupus 
erythematosus , Asthma, Emphysema, ARDS, Pharyngitis, Laryngitis, Hearing loss, Tinnitus, 
Psoriasis, Actinic keratosis ,Tuberous sclerosis, Acne, Hair growth, allopecia, pigmentation 
disorders, cystitis, incontinence, Renal artery stenosis, Interstitial nephritis, 
Glomerulonephritis, Polycystic kidney disease, Systemic lupus erythematosus, Renal tubular 

30 acidosis, IgA nephropathy, Vesicoureteral reflux, glaucoma, blindness, and Hypothyroidism, 
or other pathologies or conditions. The NOV71 nucleic acid encoding the zona pellucida 
glycoprotein 1 precursor- like protein of the invention, or fragments thereof, may further be 
useful in diagnostic applications, wherein the presence or amount of the nucleic acid or the 
protein are to be assessed. 
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N0V71 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV71 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV71 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV72 

A disclosed NOV72 nucleic acid of 6108 nucleotides (also referred to as CG56761-01) 
encoding an ankyrin repeat containing protein-like protein is shown in Table 72A. The start 
and stop codons are in bold letters. 



Table 72A. NOV72 nucleotide sequence (SEQ ID NO:171). 



ATGGCCAGTCACGTGGACCTGCTGACGGAGCTGCAGCTGCTGGAGAAGGTGCCCACGCTG 
GAGCGGCTGCGGGCTGCCCAGAAGCGCCGGGCCCAGCAGCTGAAGAAATGGGCACAGTAC 
GAGCAGGACTTGCAGCACCGCAAGCGAAAGCATGAGCGGAAGCGCAGCACGGGCGGCCGC 
CGCAAGAAAGTGTCCTTCGAGGCCAGCGTGGCCCTGCTGGAGGCCTCGCTGAGGAACGAC 
GCCGAGGAAGTACGCTACTTCCTGAAGAATAAGGTCAGCCCTGATTTGTGCAATGAGGAC 
GGACTCACAGCCCTACACCAGTGCTGCATCGACAACTTTGAGGAAATTGTGAAGCTGCTC 
CTCTCCCATGGTGCCAATGTGAACGCCAAGGACAACGAGCTGTGGACACCTCTCCATGCT 
GCAGCCACCTGCGGCCACATCAACCTGGTGAAGATCCTCGTTCAGTATGGGGCCGACTTG 
CTTGCTGTCAACTCGGATGGGAACATGCCATATGACCTCTGCGAGGATGAACCCACCCTG 
GATGTCATCGAGACCTGCATGGCATACCAGGGCATCACCCAAGAGAAAATCAACGAGATG 
CGGGTGGCTCCTGAGCAGCAGATGATTGCGGACATCCACTGCATGATCGCAGCGGGCCAG 
GACCTGGACTGGATAGATGCCCAGGGTGCCACACTGCTGCACATAGCTGGAGCCAATGGA 
TACCTGCGGGCAGCTGAGCTCCTCCTGGATCATGGAGTGCGTGTGGATGTGAAGGACTGG 
GATGGCTGGGAGCCCCTGCATGCAGCTGCCTTCTGGGGACAGATGCAGATGGCAGAGCTA 
TTGGTGTC CC ATGGAGCTAGTCTC AGTGCAAGGAC ATCC ATGGATGAGATGC CAATAGAC 
CTGTGCGAGGAGGAAGAGTTCAAGGTCCTGCTGCTGGAGCTAAAACACAAGCATGATGTG 
ATCATGAAGTCACAGCTGAGGCACAAGTCATCCTTGAGCCGGAGGACCTCCAGCGCAGGC 
AGCCGTGGGAAGGTGGTGCGGCGAGCCAGCCTGTCGGACAGGACCAACCTGTATAGGAAG 
GAGTATGAGGGAGAGGCCATCCTGTGGCAGCGGAGTGCAGCTGAGGATCAGCGGACCTCC 
ACCTACAACGGGGACATCAGGGAGACCAGGACAGACCAAGAGAATAAGGACCCTAACCCC 
AGGCTGGAGAAGCCCGTGCTACTCTCCGAATTTCCTACCAAGATCCCACGAGGTGAACTG 
GACATGCCTGTTGAGAATGGCCTCCGGGCTCCGGTCAGTGCCTACCAGTATGCGCTGGCC 
AACGGGGATGTCTGGAAGGTGCATGAGGTGCCTGACTACAGCATGGCCTATGGCAACCCT 
GGCGTGGCCGACGCCACCCCGCCCTGGAGCAGCTACAAGGAACAGAGCCCTCAGACGCTT 
CTGGAGCTGAAGCGGCAGCGGGCTGCAGCCAAGCTGCTCAGCCACCCCTTCCTTAGCACA 
CACCTGGGCAGCAGCATGGCCAGGACGGGCGAGAGTAGCAGTGAAGGCAAGGCCCCCTTG 
ATCGGAGGCAGAACTTCACCGTACAGCAGCAATGGGACCTCGGTATATTACACGGTCACC 
AGCGGAGATCCCCCACTCTTAAAGTTCAAGGCCCCCATAGAGGAGATGGAGGAGAAGGTG 
CATGGCTGTTGCCGTATCTCCTAG TCTCCGTGTGATGGAGGAGGGAGATGCCTGGGGAGG 
GGCTCCTGGAATCCAGGCCAGCCCAACAGCCCTGGCTGGGGAGGTGTCAGGGCAGCTGGG 
GAGAGGTGGGCTCTGCTTTTCAGAGGAACTCAGACCCCAGCCCTCAGCTGGCTGCCCATA 
GCATCCCATGTCCCACGTCCCGTGGTTCTGCTTCCTGCTGCATCGTCTGCCATCTGACAC 
AAGGCCTGTCGTGGCCTCCTGGTTCACTCTGCTGTCTGATCTTGGGAGGGTGGGCTTGAG 
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ATCCCAGCTCTATTCTTGGTATAAAGGCTTCTCCGGATCAGTACATGCATGTCACATTAA 
CACACACACACACACACATATACACACACACACAAGCTCGATCAGTGTGTGTAGGAATGA 
CATACCTGGGCTCAGGGGAAGCAAGGGGGCTTAGAATTTGTGGGGTATTCCCAAAAGGAT 
GGAAGTTAAGACTCAGAGTCTCATTACCACTGCCAATGTGGTTTTAGCAGGGGAGGGGAC 
CTGCTAAGCTGAGACCCATAGTCCTGCTCAGAGTTATCCCAAAGTCTGAGCCACCAGCCA 
CACCTGACAGGGGTGAGAAGTCCTCGCTGTGTTCAGAGGGAGCCAGGAATCTACATGGGT 
AGATGAGATAGACACAGACCTGCTCCCCGCAGCCTTGTTGAGAGCCACACTTCTGCCCAT 
GCCAGGAGCCAGCTGTGTGACCATCCAGGGGTGGAGGGGGAAAACCAGGCAATTTCGTTC 
CTGGAATCAACCAAATCATGTTTTCCTCTTGGATGGAAGTGTCAAAGGCAGAAGGGTGTG 
GGAGGGGGACAAGGTCAGTATTTACCAAAGTGTATCTGATTTTAAAAATTCCTTTAGTCT 
GTAAAACTCCTAGAGGGAGGGAGGTAACTGAATTCACTTCTTTTTGTGGATCGTATCAAG 
GTCACTGGGTTTTACTGGCTGGTGCTGGGAAAATGAAGCTAAGTGAGGAGCTTCCATTGG 
AATGCTTTTCCAGGGAGAGAGGCCAGTTAATTTAAAAAAAACAGTCGCTAGTTAACAGCG 
ACAGAGCCCAGCACCCTGGGGTCTTTGTGAATATCCAGACTGTTTCAGCCCAGCCCATCT 
CAGCCAACCCTCCTTAGACTGAGCTGTCAGAGCAAGCAATTAGGGGCCAGCCTGCCTCCA 
CCTCCCACCCCCTTCCACCTCCATCAGTCATGTGTGCAGAGTCAGTGCTCGGGATCCCGG 
GCCCAGCTTTTGCCTTTTTGGGGATGCTTGGTGAGACAGATTTGCCAGTCAGCCCTTTTG 
AGTTCCCGCCTCACCCAGGGGCTCCCAGCCTGCACTTGCAGGAGTGGTGATGCCCCAAGT 
CTGCGAATCCAGGGTGCACGTGGTCAATATCCCCTCCTGCATTCAGGAGAGCCATGGTAG 
GGCTGGAGTTGGGTCTTGCCCAGCCCTGCAGTTTCATAGTCCCAGCCTTCCTGGTGCTGG 
GGAGGGAGGACTGTGAATGGCTGTTCTCCCCTCACTGCTGAGTCTCCCAGGACCCCCTTT 
GGAGATGCCCATGGCATGGGCACTGCCCACAGGCTCAGCCAGAACCTCTTGGTGTACCCG 
ATAAGCTGCAGGTTATCCCTTGCTCTGTGCGCCTTTTATTTGTCCTTAAACTACCTCCTT 
AGAGCTCTGAAGGGGTCTCCTAGTTCCAGATTTTAATTTGGGGAACAGATCTGGGTTCTT 
TTTAACCCTCTTCTTTCTCAGTCTATGAGAAACTTGCCCTGAGGGGCACCTGGGCTAGGG 
GCTTGGGACTGGAAGACCATCCCCGCCTTGTGCCACAACTTTGGTCATGGGATCTGCTCT 
TTGTCATTCTTAGCCCCCTACTGTGGCCCCCATAGCCCCATAACCCAGAGAG GGAGCTGG 
ACTTCAGGGAGCCTGAGTGATGCTTTCCCAGGAGCAGGGCAGCTGGCTGGACCAGAAAGT 
AGAGGGCCCATGGGAGTGACTGCACCCTTGGTGGCTGCTGGAAGGGGAGAGGTTCTCAGC 
ATCAGGCCACCTCCACCCCAATGCCAGGATAGATGTATTCTAGAGTAGGGGTGGAGGCGG 
CCCAGGAGGCTGAAGACAGGTGCACAGATGCTTCCCACGACCTTGCCATTTGGGGTGGGC 
TCTTCAACATCTCAGGCTGTGGCTGGAACAGGACAGGATGATCTAAAACACACGTACCAT 
TGGCTGTAAAACAGTATGAGCCCAGACTGACGCTGAAATCCCTCATGAGCCAAC CTTAGC 
TACAAGGTAGGGAGTTCTGAGGGAAGCCGCGTGCTCCTCAGGAGAGAGCTGTTTAGGTTT 
TCCGATCTTTTTGCTCAGGGGCCAAACACTGAAGGCACGTACTGCCCAACCCACTGAGCG 
CCTGAGGCCATTCCCTCCTTTTCCGCATGCCTCCTGCCTCCTGGGCTATTCCTCTCCACC 
CAGAAGGCTGGGAATCCCAGCTGATTCCCTGACAGGAGCCGACTTCACACACAGGTGACT 
CTCAGGCATTGGCTCATGTTTTCAGCCAGGGATAAACCATCCCTTCTTGGGGCTTTAAGT 
CCCTGGGGAGCTTTCCCTGTAGGTCTCCTGGGTGTTGAGAGACAAGTTGGAGACCAACC T 
CCAATGAATGAGCCGCGGTCATTCATTAATTCACTCACGTAATTTACTGAGTAGCTGCAA 
CATGCCAGCCTCTACGTTAGGTTCTGCGGATAAAGGAGGAATAAGACAGAGTCAGGAGAA 
CTGTTCCTTGTGGTTTCCGTCCCTTGGGGACCACAGGCATCAGCAGTCCCATTCAAGTCA 
CCTGAGGCAAAGTGTCTGCATCTTCGTCCAGCGACCCTTTGCTTTTCGGCTCCTAGAATC 
CTTAGAGTCTGAATTCCTTTAGCTGGGAACAGCTGTCATGGTCACCCCTGGATAACATTT 
GCCACCAAGTATAGATGCTGGATCTTGGGTTCCAGGCAGACATCATCCA GGTCCATCTGG 
AACTTTCAGTGATAGCTGCCTTCAGCCAGCATCTTTGGGGGACTCTATAATAGCAGCTTG 
AGATCAGTGTCTAGAAGACTGTTCTGCAATTTGCTGCCAAATGCATCTCAGGTTTTTAAA 
GTCATTGTTTCTTGCTCATGGTGGCTCATTTATTACATAGTCCCCTCACCCCACTAATGG 
ATAATGGGAGGAAAAGTTGCTGCTTCCTTCAGCATCAAAGCCTTTCCTTGGGAATCTGCC 
TCCCTCCATGGCAGGGGTGGATTCGGGAGCTGGGAGTAACCAGGCAAAGTCAACCAGATG 
CCTAGCTCCTGCTGAGACCCAGGTCCTATGGCAGCTCCTCATTAGATTAAAGGAGACCAC 
TTCCAAAGCAGGTGCTGCATGGCTCACCATCATATGCCCCAAACAACTGAAAGTTGGCGG 
TTATCACCAGACTGTGAGTTTCTGGCAAGTAGCTTGGGGAAGCTGAATAAACTCTAGGCC 
CAGGGCTACTAAAGACTTCAGGATAGAATTCTCCATCAAATATACAGCATAAGTAAAACT 
GCTCTGCACTGTTTAATCCATTTCCAAGGGGCTTAGAAAAGCTAACAAGGGTGTGTCCCC 
TGTCCTGCCCCACCGGTTTGCTGGCTTTGTAATAACATAAGACCATTGTGGTTGTTGGTG 
TCAGATACCTTCCCATCCTGAGCTCTCTCACCTACCTGCTCTCTCTCCTAGAGCAGGATA 
CTGGGGTACTTTTAAGAAGGGTGCTCCTTTTAAGATGCCCAGAAAAGCTGTATTTAACTC 
TTGCTATTTGTAACTTGGGGATGGTCTCCCCTGCCCCAGGGCACATAAGAGCAAAGGCTC 
CAATGGTCAGTGGATGACTCTGCAAAAGTGACCCCCTGTGCCAGAAGCTATAGCCCTCTC 
CCCAACAGGTCTCTCTTGTTGGCCAGAGGGCCTGC TTCCCAT GGGCATTGCAAGTGCCAC 
CGTGCGGGGCCTGGCTCTGCACACCCAGGAAAAGTCTGCAGACCCCCAGCCCTCCGCAAT 
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AATTCACCAGACCAGAAGCCACTGGTGTACAGAGAACACTTAAAAAAATGTATTTTATGT 
GAAAAAAAATTAAAACTCTGTATACTGTATCAGCAGCTTTGTGTAAAAATGGCAATCAAG 
AGAGTCTAATATATTTAAAACTTTTTTAAAAAAAATCTTCGCAGATCTTTGATATCGTAC 
TGAGGTAACTTCCACGTAGCCCCTTGCCACGCGGCACCGGTGGGCCTTGGGTCC AAAACT 
GTGGCTCAGCCACATCCCAAAGGGGGCACATGTCCCTGGAGTTGCTTCCAGCTGCCAAGG 
CCTGTGACAGAATTCGCTGTTAAGAGTTTTTAATTAAAATTATTAAATTCCTTTTAATAA 
CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAkAAAAAAAAAA 



In a search of public sequence databases, the NOV72 nucleic acid sequence, located on 
chromsome 20 has 5593 of 5597 bases (99%) identical to a gb:GENBANK- 
ID:AB020630|acc:AB020630.1 mRNA from Homo sapiens (Homo sapiens mRNA for 
KIAA0823 protein, partial cds). Public nucleotide databases include all GenBank databases 
and the GeneSeq patent database. 

The disclosed NOV72 polypeptide (SEQ ID NO: 172) encoded by SEQ ID NO: 171 has 
567 amino acid residues and is presented in Table 72B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV72 has no signal peptide and is 
likely to be localized in the nucleus with a certainty of 0.9800. 



Table 72B. Encoded NOV72 protein sequence (SEQ ID NO:172). 



I^SHVDLLTELQLLEKVPTLERLRAAQKRRAQQLKKWAQYEQDLQHRKRKHERKRSTGGR 
RKKVS FE AS VALLE ASLRNDAEEVRYFLKNKVS PDLCNEDGLTALHQCC IDNFEE IVKLL 
LSHGAJWNAKDNELWTPLHAAATCGHINLVKILVQYGADLLAVNSDGNMPYDLCEDEPTL 
DVIETCMAYQGITQEKINEMRVAPEQQMIADIHCMIAAGQDLDWIDAQGATLLHIAGANG 
YLRAAELLLDHGWVDVKDWDGWEPLHAAAFWGQMQM^LLVSHGASLSARTSMDEMPID 
LCEEEEFKVLLLELKHKHDVIMKSQLRHKSSLSRRTSSAGSRGKWRRASLSDRTNLYRK 
EYEGEAILWQRSAAEDQRTSTYNGDIRETRTDQENKDPNPRLEKPVLLSEFPTKIPRGEL 
DMPVENGLRAPVSAYQYALANGDVWKVHEVPDYSMAYGNPGVADATPPWS S YKEQS PQTL 
LELKRQRAAAKLLSHPFLSTHLGS SMARTGES S SEGKAPL IGGRTS P YS SNGTS VYYTVT 
SGDPPLLKFKAP I EEMEEKVHGCCRI S 



A search of sequence databases reveals that the NOV72 amino acid sequence has have 
412 of 412 amino acid residues (100%) identical to, and 412 of 412 amino acid residues 
(100%) similar to, the 412 amino acid residue ptnr:SPTREMBL-ACC:094912 protein from 
Homo sapiens (Human) (KIAA0823 PROTEIN). Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV72 is expressed in at least Adrenal Gland/Suprarenal gland, Amygdala, Artery, 
Bone, Brain, Dermis, Heart, Hippocampus, Kidney, Lung, Lymph node, Lymphoid tissue, 
Mammary gland/Breast, Pancreas, Peripheral Blood, Small Intestine, Spleen, Stomach, 
Substantia Nigra, Synovium/Synovial membrane, Thalamus, Tonsils, Umbilical Vein, Urinary 
Bladder, and Uterus. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, Literature sources, and/or RACE sources. 
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The disclosed NOV72 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 72C. 



Table 72C. BLAST results for NOV72 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi 1 14770818 | ref | XP 
028840. 1| 
(XM_028840) 


KIAA0823 protein 
[Homo sapiens] 


567 


567/567 
(100%) 


567/567 
(100%) 


0.0 


gi ] 14029700 |gblAAK5 
2795.l|AF362909 1 
(AF362909 


CAAX box protein 
TIMAP [Bos 
taurus] 


568 


550/568 
(96%) 


557/568 
(97%) 


0.0 


gi|l8157719|gb|AAL,6 
2093 .1 |AF423761 1 
(AF423761) 


protein 
phosphatase 1 
regulatory 
subunit 16B [Mus 
musculus] 


568 


547/568 
(96%) 


556/568 
(97%) , 


0.0 


qx\ 4240132 | db j | BAA7 
4846. 1| (AB020630 


KIAA0823 protein 
[Homo sapiens] 


412 


412/412 
(100%) 


412/412 
(100%) ■ 


0.0 


gii9368796|emb|CAB9 
8284. 1| (AL121889) 


dJ1076E17.1 
{KIAA0823 protein 
(continues in 

AL023803)) [Homo 

sapiens] 


411 


411/411 
(100%) 


411/411 
(100%) 


0.0 



5 Table 72D lists the domain descriptions from DOMAIN analysis results against 

NOV72 . This indicates that the NOV72 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 72D. Domain Analysis of NOV72 

gnl 1 Pf am|pf am00Q23 , ank, Ank repeat. Ankyrin repeats generally consist 
of a beta, alpha, alpha, beta order of secondary structures. The 
repeats associate to form a higher order structure. 

CD-Length = 33 residues, 100.0% aligned 

Score = 47.4 bits (111), Expect = 2e-06 



1 0 Ankyrin repeats are tandemly repeated modules of about 33 amino acids. They occur 

in a large number of functionally diverse proteins mainly from eukaryotes. The few known 
examples from prokaryotes and viruses may be the result of horizontal gene transfers. The 
conserved fold of the ankyrin repeat unit is known from several crystal and solution structures, 
e.g. from: p53-binding protein 53BP2, cyclin-dependent kinase inhibitor pi 9Ink4d ? 

1 5 transcriptional regulator GABP-beta, and NF-kappaB inhibitory protein IkB-alpha. It has has 
been described as an L-shaped structure consisting of a beta-hairpin and two alpha-helices. 
Many ankyrin repeat regions are known to function as protein-protein interaction domains. 
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The disclosed NOV72 nucleic acid of the invention encoding a ankyrin repeat 
containing protein-like protein includes the nucleic acid whose sequence is provided in Table 
72A or a fragment thereof. The invention also includes a mutant or variant nucleic acid any of 
whose bases may be changed from the corresponding base shown in Table 72A while still 
encoding a protein that maintains its ankyrin repeat containing protein-like activities and 
physiological functions, or a fragment of such a nucleic acid. The invention further includes 
nucleic acids whose sequences are complementary to those just described, including nucleic 
acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 1 percent of the bases may be so 
changed. 

The disclosed NOV72 protein of the invention includes the ankyrin repeat containing 
protein-like protein whose sequence is provided in Table 72B. The invention also includes a 
mutant or variant protein any of whose residues may be changed from the corresponding 
residue shown in Table 72B while still encoding a protein that maintains its ankyrin repeat 
containing protein-like activities and physiological functions, or a functional fragment thereof. 
In the mutant or variant protein, up to about 0 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a t>)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this ankyrin repeat 
containing protein-like protein (NOV72) may function as a member of a "ankyrin repeat 
containing protein family". Therefore, the NOV72 nucleic acids and proteins identified here 
may be useful in potential therapeutic applications implicated in (but not limited to) various 
pathologies and disorders as indicated below. The potential therapeutic applications for this 
invention include, but are not limited to: protein therapeutic, small molecule drug target, 
antibody target (therapeutic, 0 diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or 
prognostic marker, gene therapy (gene delivery/gene ablation), research tools, tissue 
regeneration in vivo and in vitro of all tissues and cell types composing (but not limited to) 
those defined here. 
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The NOV72 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the ankyrin repeat t 
containing protein-like protein (NOV72) may be useful in gene therapy, and the ankyrin repeat 
containing protein-like protein (NOV72) may be useful when administered to a subject in need 
thereof. By way of nonlimiting example, the compositions of the present invention will have 
efficacy for treatment of patients suffering from Cardiomyopathy, Atherosclerosis, 
Hypertension, Congenital heart defects, Aortic stenosis, Atrial septal defect (ASD), 
Atrioventricular (A-V) canal defect, Ductus arteriosus , Pulmonary stenosis, Subaortic 
stenosis, Ventricular septal defect (VSD), valve diseases, Tuberous sclerosis, Scleroderma, 
Obesity, Transplantation, Systemic lupus erythematosus , Autoimmune disease, Asthma, 
Emphysema, Scleroderma, allergy, Diabetes, Autoimmune disease, Renal artery stenosis, 
Interstitial nephritis, Glomerulonephritis, Polycystic kidney disease, Systemic lupus 
erythematosus, Renal tubular acidosis, IgA nephropathy, Hypercalceimia, Lesch-Nyhan 
syndrome, Von Hippel-Lindau (VHL) syndrome , Alzheimer's disease, Stroke, Tuberous 
sclerosis, hypercalceimia, Parkinson's disease, Huntington's disease, Cerebral palsy, Epilepsy, 
Multiple sclerosis, Ataxia-telangiectasia, Leukodystrophies, Behavioral disorders, Addiction, 
Anxiety, Pain, Neuroprotection, or other pathologies or conditions. The NOV72 nucleic acid 
encoding the ankyrin repeat containing protein-like protein of the invention, or fragments 
thereof, may further be useful in diagnostic applications, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed. 

NOV72 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV72 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV72 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV73 

A disclosed NOV73 nucleic acid of 101 1 nucleotides (also referred to as CG573 13-01) 
encoding a GPCR-like protein is shown in Table 73A. The start and stop codons are in bold 
letters. 



Table 73 A. NOV73 nucleotide sequence (SEQ ID NO: 173). 

CATGG ATT TCC CTT AAGAAAAAAC AG AATG ATC AATG ATAG CC ACT TC AG TGG TTTTATACT C CTTGGAT 
TCACAGGGCAGCCTCAGCTTCAGATGATGATCTCTGGGGTTGTCTTTTTCTTCTACACTATTGCCTTCAT 
GGGAAATATGGCCATCATCCTATTGTCTTTCCTAGATGACCATCTCCAAGTCCCCATGTACTTCTTCCTT 
AGAAATTTGGCCATCTTGGATCTCTGTTATACCACAAATATAGTCCCACAAATGTTGGTCAGTATCTGGG 
GCAAAGACAAAAGAATTACCTTTGGTGGGTGTGCCTTTCAACTTTTCATTGATGTGGCACTGTACTCAGT 
TGAATGCATCCTTCTGTCCATGATGTCATATGATCGACTCAATGCTATCTGCAAGCCTCTGCATCATATG 
ACCATAATGAACCTCCAACTCTGCCAGGGCCTTGTGGTCATCTCCTGGGTAGTTGGTGTGATTAATTGCA 
TCATACCTTCCCCTTATGCCACGAGTCTTCCTCGATGTAGGAACCACCACCTAGACCACTTTTTTGTGTG 
TGTGAAATGTCTGCAAAGATCAAGATTCAAGATTGCATGTGTGGACACCACAGCCATGGAGGTAACCACA 
TTTGCCATGTGCCTGATTATAGTTCTTGTTCCTCTTCTTCTTATTCTTGTGTCATATGGTTTCATTGCTG 
TGGCTGTACTCAAGATCAAGTCTGCAGCAGGAAGACAAAAAGCATTTGGGACCTGTTCCTCCCATCTCGT 
TG TGG TAT CCATC TTCTGTGGGACAGT TACATACATGTATATACAG CCAGGAAACAGT CCAAATCAGAAT 
GAGGGCAAACTTCTCAGTATATTTTACTCCATTGTTACTCCCAGCTTGAACCCATTAATTTATACGGTAA 
GG AAT AAGGAGTT CAAGGGGG C CATG AAGAGGCTAACTGGAAAAGAAAAAGATTG CATGGAAAAAAGAGG 
ACATTGATTCTTCCTCCCAGCAATTTCTAAT 



In a search of public sequence databases, the NOV73 nucleic acid sequence, located on 
chromsome 6 has has 223 of 363 bases (61%) identical to a gb:GENBANK- 
ID:U86270|acc:U86270.1 mRNA from Homo sapiens (Homo sapiens olfactory receptor 
(OR5-40) gene, partial cds). Public nucleotide databases include all GenBank databases and 
the GeneSeq patent database. 

The disclosed NOV73 polypeptide (SEQ ID NO: 174) encoded by SEQ ID NO: 173 has 
319 amino acid residues and is presented in Table 73B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV73 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.6400. The most likely cleavage site 
for a NOV73 peptide is between amino acids 42 and 43. 



Table 73B. Encoded NOV73 protein sequence (SEQ ID NO: 174). 

MINDSHFSGFILLGFTGQPQLQMMISGWFFFYTIAFMGNMAIILLSFLDDHLQVPMYFF 
LRNLAI LDLC YTTNT VPQML VS I WGKDKR I TFGGCAFQL F I DVAL YS VEC I LLSMMS YDR 
LNAICKPLHHMTIMNLQLCQGLWISWWGVINCIIPSPYATSLPRCRNHHLDHFFVCVK 
CLQRSRFKIACVDTTAMEVTTFAMCLIIVLVPLLLILVSYGFIAVAVLKIKSAAGRQKAF 
GTCS SHLVWS IFCGTVTYMYIQPGNS PNQNEGKLLS I FYS I VTPSLNPL I YTVRNKEFK 
GAMKRLTGKEKDCMEKRGH 



A search of sequence databases reveals that the NOV73 amino acid sequence has have 
165 of 304 amino acid residues (54%) identical to, and 226 of 304 amino acid residues (74%) 
similar to, the 320 amino acid residue ptnr:SPTREMBL-ACC:Q9Y3N9 protein from Homo 
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sapiens (Human) (DJ88J8.1 (NOVEL 7 TRANSMEMBRANE RECEPTOR (RHODOPSIN 
FAMILY) (OLFACTORY RECEPTOR LIKE) PROTEIN) (HS6M1-15))). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV73 polypeptide has homology to the amino acid sequences shown 
5 in the BLASTP data listed in Table 73 C. 



Table 73C. BLAST results for NOV73 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l8480000|gb|AAL6 
1014. l| (AY073351) 


olfactory 
receptor MOR256-8 
[Mus musculus] 


308 


169/307 
(55%) 


232/307 
(75%) 


8e-88 


gi|l3624329|ref |NP 

112165.1] 

(NM_030903) 


olfactory- 
receptor, family 
2, subfamily W, 
member 1 [Homo 
sapiens] 


320 


165/304 
(54%) 


226/304 
(74%) 


5e-86 


gi| 18480974 |gb|AAL6 
1501. l| (AY073838 


olfactory 
receptor MOR256- 
31 [Mus musculus] 


312 


168/304 
(55%) 


227/304 
(74%) 


5e-86 


gi| 12054429 |ernb|CAC 
20522.1 | (AJ302602) 


olfactory 
receptor [Mus 
musculus] 


320 


165/304 
(54%) 


225/304 
(73%) 


e-85 


gi| 12054431 | emb | CAC 
20523. 1| (AJ302603) 


olfactory 
receptor [Homo 
sapiens] 


320 


164/304 
(53%) 


226/304 
(73%) 


e-85 



Table 73D lists the domain descriptions from DOMAIN analysis results against 
NOV73. This indicates that the NOV73 sequence has properties similar to those of other 
10 proteins known to contain this domain. 



Table 73D. Domain Analysis of NOV73 

gnHPfam[pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 

Score =» 92.0 bits (227), Expect = 4e~20 



The protein sequence fingerprint is potently diagnostic of all sequences of this type in 
the database in which it was derived (the OWL composite sequence database, version 8.1), and 
1 5 has continued to perform well on subsequent database updates, identifying 240 receptors in 
OWL17.0. Results are compared with a commonly used pattern template for this class of 
receptors. The investigation suggests that discriminating power is improved in the fingerprint 
approach because the recognition of individual features is made mutually conditional. 
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Furthermore, by avoiding the definition of predetermined feature separations, members of 
protein families possessing all or only part of the fingerprint may be identified. PMID: 
8386361 

The fingerprint encodes the seven putative membrane-spanning motifs and was 
5 potently diagnostic of all GPCRs (52 in all) in version 8. 1 of the OWL composite sequence 
database, readily distinguishing them from all other integral membrane proteins. With a 3 -fold 
increase in the size of OWL, the fingerprint has been updated and now finds 332 receptors that 
match all the motifs. 

The glucagon receptor is a member of a distinct class of G protein-coupled receptors 
10 (GPCRs) sharing little amino acid sequence homology with the larger rhodopsin-like GPCR 

family. To identify the components of the glucagon receptor necessary for G-protein coupling, 
sequentially all or part of each intracellular loop (il, i2, and i3) and the C-terminal tail of the 
glucagon receptor were replaced with the 1 1 amino acids comprising the first intracellular loop 
of the D4 dopamine receptor. 

15 Whereas numerous mutations of the human lutropin receptor (hLHR) and human TSH 

receptor (hTSHR) have been shown to cause constitutive activation of these receptors, it has 
been suggested that either the hFSHR as a whole, or the i3/TM VI region of the hFSHR, is less 
susceptible to mutation-induced constitutive activation. 

The disclosed NOV73 nucleic acid of the invention encoding a GPCR-like protein 
20 includes the nucleic acid whose sequence is provided in Table 73 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 73 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
25 described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
30 in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 39 percent of the 
bases may be so changed. 
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The disclosed NOV73 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 73B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 73B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
5 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 46 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 

10 (NOV73) may function as a member of a "GPCR family". Therefore, the NOV73 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 

1 5 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV73 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

20 and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 

(NOV73) may be useful in gene therapy, and the GPCR-like protein (NOV73) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 

25 cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 

30 prostate cancer, or other pathologies or conditions. The NOV73 nucleic acid encoding the 

GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 
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NOV73 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV73 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV73 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV74 

A disclosed NOV74 nucleic acid of 1008 nucleotides (also referred to as CG573 15-01) 
encoding a GPCR-like protein is shown in Table 74A. The start and stop codons are in bold 
letters. 



Table 74A. NOV74 nucleotide sequence (SEQ ID NO:175). 



TTCACAGCTGATATTCAGAAAATAGCAAAAATGATCAATGATAGTTACTTTGGTTGGCTTATGCTCCTTG 
GGTTCCCTGGGAAGCCTCAGCTGGAGATGATCATCTCTGGGGTTGTCTTTTTCTTCTATGCAATTTCTTT 
GATGGGAAATATGGTCCTTATCCTGCTGCCATTACTGGATAAACATCTCCAT^ACCCCCATATATTTCTTT 
CTTAGAAATCTGGCTATCTTGGATCTTTGTTACACCACAAATATAGTCCCACAGATGTTGGTCAATGCCT 
GGGGTAAAGACAAGAAAATCACTTTTGGTGGCTGTGCTTTTCAACTTTTCACTAATGTGACGCTATGCAC 
GGTTGAATGTATGCTTCTGGCTGTGATGTCATATGACCCATTCAATGCTGTCTGCAAGCCTCTGGACTAT 
ATGACCATAATGAACCCCCAACTCTGTCAAGGCCTGGTGGCCATGACCTGGTTAATTGGTGTCACTAATT 
GCATGATACTTTCCCCCTGTCCTGTGAGTCTTCCTCGATGCGGAGACCACCACCTGGATCACTATTTTTG 
TGAAATATCTGCAATGGTCAAAATTGCATGTGGGGCTACCACAGTCATGGAGTTGCATTGTGTTGTTGTT 
GTTGTTTTCATTTTCCTTGCATCACTTCTTCTCATTCTTGTGTCATATGGCTTCATTGCTGTGGCTGTAC 
TCAAGATCAAGTCTGCAGCAGGAAGACAAAAAGCATTTGGGACCTGTTTCTCCCATCTCATTGTGGTATC 
CAT CTT CT ATGGG AC TGT T AG AT AT ATGT ATAT AGAG C C AGGAAAC AGT C C AT CT C AGGATGAGGG C AAA 
CTTCTCCATATATTTTACTCCATTGTTACTCCCACCTTGAACCCAATCCCACTAAGGAATAAGGAGTTCA 
AGTGGGCCATGAAAAGGCTTATTGGAAAAGAAAAAGGTTCTGGAGACACAATAGGTCACTAACATCTTTT 
TACAAGAAATTCCTGGCCGGGCACGGTG 



In a search of public sequence databases, the NOV74 nucleic acid sequence, located on 
chromsome 6 has has 583 of 946 bases (61%) identical to a gb:GENBANK- 
ID:RATOLlRECE|acc:L34074.1 mRNA from Rattus norvegicus (Rat OL1 receptor gene, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV74 polypeptide (SEQ ID NOT 76) encoded by SEQ ID NO: 175 has 
313 amino acid residues and is presented in Table 74B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV74 has a signal peptide and is 
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likely to be localized extracellularly with a certainty of 0.6000. The most likely cleavage site 
for a NOV74 peptide is between amino acids 63 and 64. 



Table 74B. Encoded NOV74 protein sequence (SEQ ID NO: 176). 



MINDSYFGWLMLLGFPGKPQLEMIISGWFFFYAISLMGNMVLILLPLLDKHLQTPIYFF 
LRNIAILDLCYTTNIVPQMLWAWGKDKKITFGGCAFQLFTNVTLCTVECMLIAVMSYDP 
FNAVCKPLDYMTII^PQLCQGLVAMTWLIGVTNCMILSPCPVSLPRCGDHHLDHYFCEIS 
AMVKI ACGATTVMELHCVVVVVF I FLAS LLL I LVS YG F I AVAVLKI KS AAGRQKAFGTCF 
SHLIWS I FYGTVRYMY I E PGNS PSQDEGKLLH I FYS I VTPTLNP I PLRNKE FKWAMKRL 
IGKEKGSGDTIGH 
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A search of sequence databases reveals that the NOV74 amino acid sequence has 161 
of 298 amino acid residues (54%) identical to, and 220 of 298 amino acid residues (73%) 
similar to, the 320 amino acid residue ptnr:SPTREMBL-ACC:Q9Y3N9 protein from Homo 
sapiens (Human) (DJ88J8.1 (NOVEL 7 TRANSMEMBRANE RECEPTOR (RHODOPSIN 
FAMILY) (OLFACTORY RECEPTOR LIKE) PROTEIN) (HS6M1-15))). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

NOV74 is expressed in at least Whole Organism. Expression information was derived 
from the tissue sources of the sequences that were included in the derivation of the sequence 
of CG57315 01.The sequence is predicted to be expressed in the following tissues because of 
the expression pattern of (GENBANK-ID: gb :GENB ANK-ID : RATOL 1 RECE |acc : L34074.1) 
a closely related Rat OL1 receptor gene, complete cds homolog in species Rattus norvegicus: 
heart. This information was derived by determining the tissue sources of the sequences that 
were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, Literature sources, and/or RACE sources. 

The disclosed NOV74 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 74C. 



Table 74C. BLAST results for NOV74 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi|l7464347|ref |XP 


similar to 

dM53 8M10.7 (novel 

7 transmembrane 

receptor 

(rhodopsin i 
family) 

(ol factory- 
receptor like) 
protein) [Homo 
sapiens] 


590 


186/186 
(100%) 


186/186 
(100%) 


e-100 


069460 . 1 | 
(XM_069460) 
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qi| 184 8 0 974 | qb | AAL»6 
1501 . 1 | (AY073838) 


olfactory- 
receptor MOR256- 
31 [Mus musculus] 


312 


167/293 
(56%) 


218/293 
(73%) 


e-84 


qi | 18480000 | qb | AAL6 
1014. 1| (AY073351 


olfactory 
receptor MOR2 56-8 

L 1 i U. O lllLl SCU11IS J 


308 


159/303 
(52%) 


225/303 
(73%) 


2e-83 


ga | 13624329[ re f j NP 


olfactory 
receptor, family 
2, subfamily W, 
member 1 [Homo 
sapiens] 


320 


161/301 
(53%) 


219/301 
(72%) 


3e-83 


112165. l| 
(NM_030903) 


gi | 12 05442 9 | emb | CAC 


olfactory 
receptor [Homo 
sapiens] 


320 


161/301 
(53%) 


218/301 
(71%) 


6e-83 


20522 . 1 | (AJ302602) 



Table 74D lists the domain descriptions from DOMAIN analysis results against 
NOV74. This indicates that the NOV74 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 74D. Domain Analysis of NOV74 

gnUPfamlpfamOOOO 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 99.2% aligned 

Score = 96.3 bits (238), Expect = 2e-21 

The disclosed NOV74 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 74A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 74A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 39 percent of the 
bases may be so changed. 
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The disclosed NOV74 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 74B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 74B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 46 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a t>)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV74) may function as a member of a "GPCR family". Therefore, the NOV74 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV74 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV74) may be useful in gene therapy, and the GPCR-like protein (NOV74) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV74 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 
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NOV74 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV74 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV74 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV75 

A disclosed NOV75 nucleic acid of 1050 nucleotides (also referred to as CG573 17-01) 
encoding a GPCR-like protein is shown in Table 75A. The start and stop codons are in bold 
letters. 



Table 75A. NOV75 nucleotide sequence (SEQ ID NO:177). 

ACTTTTTAAAGATTGATATTTTGCCCAATGGCCAACACATTATCCTCCCTGAATTCTTGTAATGTGTTTC 
TCCTAGTTCTGAACAGGGTGATGGGCATGACCAACAGCAGTGTCAAGGGAGACTTCATCCTGGTGGGTTT 
CTCTCATCAGCCCCACCTGGAAAAGATCCTCTTTGTGGCTGTTTTGATATCCTATCTCCTTACCCTTGTG 
GG AAATACAGTAATTATT CTGAT C TG CT C TG TAGACC C T AAAC T CAAGACAC C CATG TAT TTTT T C TT AA 
C T C AC C T C T C C TTAGT TGATAT CTGT TT TAC CAC CAGTAT TGT C C C C CAG CTG C TG TGGAAC C T AAAAGG 
ACCTGACAAAACAATCACATTCCTGGGTTGTGTCATCCAGCTCTACATCTCCCTGGCATTGGGCTCCACT 
GAGTGTGTCCTCCTGGCTGTAATGGCTTTTGATCGCTATGCTGCAGTTTGCAAACCTCTCCACTATACCG 
CCGTAATGAACCCTCAGCTGTGCCAGGCTCTGGCAGGGGTTGCGTGGCTGAGTGGAGTGGGAAACACTCT 
TATCCAGGGCACTGTCACCCTCTGGCTTCCTCGCTGTGGACACCGATTGCTCCAACATTTTCTTCGTGAG 
GTACCCTCCATGATTAAGCTTGCATGTGTGGACATCCATGATAATGAGGTTCAGCTCTTTGTTGCTTCAC 
TGGTCTTGCTCCTCTTGCCCTTAGTGCTAATACTGCTGTCCTATGGACATATAGCCAAGGTGGTCATAAG 
GAT CAAG T CAGT C CAGG C CTGG TG CAAAGG C C TGGGGACATGTGGAT CC CATT TG ATAGTAGTG T C C C T C 
TTCTGTGGGACCATCACAGCTGTCTACATCCAGTCCAACAGTTCTTATGCCCATGCTCATGGGAAGTTCA 
TCTCCCTCTTCTATACAGTTGTGACCCCGACCCTCAATCCTCTCATCTACACACTGAGGAATAATGACGT 
G AAAGGAG CACTG CGATTATT TAAC AGAGAC TTAGG CACAT AAAAAATGAAG CAGAG TACACAG CG C T CA 



In a search of public sequence databases, the NOV75 nucleic acid sequence, located on 
chromsome 6 has 61 1 of 912 bases (66%) identical to a gb:GENBANK- 
ID:HUMORLMHC|acc:L35475.1 mRNA from Homo sapiens (Human olfactory receptor-like 
gene, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV75 polypeptide (SEQ ID NO: 1 78) encoded by SEQ ID NO: 1 77 has 
331 amino acid residues and is presented in Table 75B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV75 has a signal peptide and is 
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likely to be localized extracellularly with a certainty of 0.6400. The most likely cleavage site 
for a NOV75 peptide is between amino acids 23 and 24. 

Table 75B. Encoded NOV75 protein sequence (SEQ ID NO:178). 

MANTLSSLNSCNVFLLVLNRVMGMTNSSVKGDFILVGFSHQPHLEKILFVAVLISYLLTL 
VGNTVIILICSVDPKLKTPMYFFLTHLSLVDICFTTSIVPQLLWNLKGPDKTITFLGCVI 
QLYISIALGSTECVLI^ViyiAFDRYAAVCKPLHYTAViyiNPQLCQAIAGVAWLSGVGNTLIQ 
GTVTLWLPRCGHRLLQHFLREVPSMIKLACVDIHDNEVQLFVASLVLLLLPLVLILLSYG 
H I AKWI R I KS VQAWCKGLGTCGSHL I WSLFCGT I TAVY I QSNS S YAHAHGKF I SL FYT 

WTPTLNPLIYTLRNNDVKGALRLFNRDLGT 

A search of sequence databases reveals that the NOV75 amino acid sequence has 1 78 
of 306 amino acid residues (58%) identical to, and 234 of 306 amino acid residues (76%) 
similar to, the 312 amino acid residue ptnr:SPTREMBL-ACC:Q9R0Z2 protein from Mus 
musculus (Mouse) (573K1.3 (MM17M1-4 (NOVEL 7 TRANSMEMBRANE RECEPTOR 
(RHODOPSIN FAMILY) (OLFACTORY RECEPTOR LIKE) PROTEIN))). Public amino 
acid databases include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV75 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 75C. 



Table 75C. BLAST results for NOV75 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi|l7464347|ref | XP 


similar to 

dM53 8M10.7 (novel 

7 transmembrane 

receptor 

(rhodopsin 
family) 

(olfactory- 
receptor like) 
protein) [Homo 
sapiens] 


590 


270/322 
(83%) 


271/322 
(83%) 


e-121 


069460. l| 
(XM_069460) 


gi | 18480232 | gb | AAL6 
1130. 1| (AY073467 


olfactory 
receptor MOR256-9 
[Mus musculus] 


309 


254/308 
(82%) 


273/308 
(88%) 


e-120 


gi | 1848 0 518 |gb|AAL6 


olfactory 
receptor MOR256- 
19 [Mus musculus] 


317 


229/298 
(76%) 


259/298 
(86%) 


e-112 


1273. 1| (AY073610) 


gi | 18565094 | ref | XP 


hypothetical 
protein XP_094939 
[Homo sapiens] 


310 


225/235 
(95%) 


226/235 
(95%) 


e-112 


094939 . 1 | 
(XM 094939) 


gi| 14596252 | emb | CAC 


dM53 8M10.7 (novel 
7 transmembrane 
receptor 
(rhodopsin 
family) 

(olfactory j 
receptor like) 
protein) [Mus 
musculus] 


317 


233/307 
(75%) 


260/307 
(83%) 


e-111 


43450 . 1 | (AL136158) 



417 



xsilL: -A.** ^LcJJ -H^P .uJIML .S* 1 »i»3* ^ S««3» 



Table 75D lists the domain descriptions from DOMAIN analysis results against 
NOV75. This indicates that the NOV75 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 75D. Domain Analysis of NOV75 

gnl|Pfam[pfamOQ001 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 46.9% aligned 

Score =78.6 bits (192), Expect = 5e-16 



The disclosed NOV75 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 75A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 75 A while still encoding a protein that maintains 

10 its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 

1 5 include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as anti sense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 34 percent of the 

20 bases may be so changed. 

The disclosed NOV75 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 75B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 75B 
while still encoding a protein that maintains its GPCR-like activities and physiological 

25 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 42 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(Fab)2, that bind immunospecifically to any of the proteins of the invention. 
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The above defined information for this invention suggests that this GPCR-like protein 
(NOV75) may function as a member of a "GPCR family". Therefore, the NOV75 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV75 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV75) may be useful in gene therapy, and the GPCR-like protein (NOV75) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV75 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV75 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV75 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV75 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV76 

A disclosed NOV76 nucleic acid of 1063 nucleotides (also referred to as CG57321-01) 
encoding a GPCR-like protein is shown in Table 76A. The start and stop codons are in bold 
letters. 



Table 76A. NOV76 nucleotide sequence (SEQ ID NO:179). 

TCTCTAATTCTCAGTGGCTTCCTCCTACTGTTGATGTCTATCCCTAACTGTGGGTATTTAGAGGTCTCAG 
CTGGAATTTCACCTCCCAGTGCTAACATGTGGATCAACAATCAAAGCTCGCTAGATGATTTTATCCTATT 
GGGATTTTCTGACCGTCCCTGGCTAGAGACACCCCTCTGTGTAATCTTTCTGGTGGCCTACATCTTTTCC 
CTATTTGGAAATATCTCCATTATCCTAGTTTCCCATCTGGATCCCCAGCTTGACAGTCCCATGTACTTTT 
TTGTCTCTAATCTATCCTTTCTGGACCTCTGCTATACCACCAGCACTGTCCCACAGATGCTGGTCAACCT 
CCGGGGACCAGAAAAGACCATTAGCTATGGGGGTTGTGTTGCCCAACTCTATATATTTTTGGCCCTGGGT 
T CTAC TGAATG CATAC T T CTAG C CAT CATGG C CT T TGAC CGT TACG C TG C CATATG CAAG C C C CTT CACT 
ACC CAG T CAT CATGAAC CATAGACG C TGTAT C CACATGG C TG CTGG CAC TTGGAT CAGTGG C TTTGC TAA 
CTCCCTTGTCCAGTCCACTCTCACAGTGGTGGCCCCAAGATGTGGACAGAGGGTGTTGGACCATTTCTTC 
TGTGAAGTTCCAGCCCTTTTGAAACTAGCCTGTATTGATATTCGTGTGAATGAAATGGAGCTCAATGTAC 
TAGGCGCTTTGCTTCTCCTGATGCCACTCACCCTCATCCTGGGCACTTATGTGTTCATTGCTCAGGCAGT 
AATGAGAATCTGCTCTGCTGAAAGTCGCTGGAAGGCTTTCAATACCTGTGCCTCACATTTGCTGGTGGTC 
TCCCTCTTCTACTTCACAGCCATCAGTATGTATGTCCAGCCTCCCTCTAGCTATTCTCATGACCGGGGGA 
AGATCATCATGGCTCTCTTTTATGGCATTGTCACACCCACCCTCAACCCATTCATCTACACATTGAGAAA 
CAAGGATGTG AAAG C TG C C CTGAG AAGG T CACTGACTAAAG AG T TTTGGATTAAGACAAG ATGATAT C TG 
AAAAG AAG T C C T A 



In a search of public sequence databases, the NOV76 nucleic acid sequence, located on 
chromsome 6 has 657 of 971 bases (67%) identical to a gb:GENBANK- 
ID:RATOLlRECE|acc:L34074.1 mRNA from Rattus norvegicus (Rat OL1 receptor gene, 
complete cds). Public nucleotide databases include all GenBank databases and the GeneSeq 
patent database. 

The disclosed NOV76 polypeptide (SEQ ID NO: 180) encoded by SEQ ID NO: 179 has 
336 amino acid residues and is presented in Table 76B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV76 has a signal peptide and is 
likely to be localized localized extracellularly with a certainty of 0.6000. The most likely 
cleavage site for aNOV76 peptide is between amino acids 14 and 15. 



Table 76B. Encoded NOV76 protein sequence (SEQ ID NO: 180). 

MSIPNCGYLEVSAGISPPSANMWINNQSSLDDFILLGFSDRPWLETPLCVIFLVAYIFSL 
FGNISIILVSHLDPQLDSPMYFFVSNLSFLDLCYTTSTVPQMLVNLRGPEKTISYGGCVA 
QLYIFI^GSTECILLAIMAFDRYAAICKPLHYPVIMNHRRCIHMAAGTWISGFANSLVQ 
STLTWAPRCGQRVLDHFFCEVPALLKLACIDIRVNEMELNVLGALLLLMPLTLILGTYV 
FIAQAVMRICSAESRWKAFNTCASHLLVVSLFYFTAISMYVQPPSSYSHDRGKIIMALFY 
G I VTPTLNP F I YTLRNKD VKAALRRS LTKE FW I KTR 



A search of sequence databases reveals that the NOV76 amino acid sequence has 197 
of 308 amino acid residues (63%) identical to, and 245 of 308 amino acid residues (79%) 
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similar to, the 313 amino acid residue ptnr:SPTREMBL-ACC:Q63394 protein from Rattus 
norvegicus (Rat) (OL1 RECEPTOR). Public amino acid databases include the GenBank 
databases, SwissProt, PDB and PIR. 

The disclosed NOV76 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 76C. 



Table 76C. BLAST results for NOV76 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l7464349|ref |XP 
069461. 1| 
(XM_069461) 


similar to 
olfactory- 
receptor, family 
2, subfamily B, 
member 2 [Homo 
sapiens] 


298 


297/315 
(94%) 


298/315 
(94%) 


e-147 


gi | 18479350 |qb|AAL6 
0689. 1 | (AY073026) 


olfactory 
receptor MOR256-3 
[Mus musculus] 


315 


287/315 
(91%) 


302/315 
(95%) 


e-147 


gi | 11177906 | ref |NP 
068632. 1| 
(NM 021860) 


olfactory 
receptor [Mus 
musculus] 


313 


197/308 
(63%) 


245/308 
(78%) 


e-104 


gi|l4780900|ref |NP 
149046 . 1 | 
(NM_033057) 


olfactory 
receptor, family 
2, subfamily B, 
member 2 [Homo 
sapiens] 


357 


199/307 
(64%) 


245/307 
(78%) 


e-103 


gi | 184 8 04 06 |qb|AAL6 
1217. l| (AY073554) 


olfactory- 
receptor MOR256- 
10 [Mus musculus] 


313 


196/308 
(63%) 


242/308 
(77%) 


e-103 



Table 76D lists the domain descriptions from DOMAIN analysis results against 
NOV76. This indicates that the NOV76 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 76D. Domain Analysis of NOV 76 

gnllPfamlpfamOOOOK 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 

Score = 95.9 bits (237), Expect = 3e-21 

The disclosed NOV76 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 76A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 76A while still encoding a protein that maintains 
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its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
5 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

10 In the mutant or variant nucleic acids, and their complements, up to about 33 percent of the 
bases may be so changed. 

The disclosed NOV76 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 76B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 76B 

15 while still encoding a protein that maintains its GPCR-like activities and physiological 

functions, or a functional fragment thereof. In the mutant or variant protein, up to about 37 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

20 The above defined information for this invention suggests that this GPCR-like protein 

(NOV76) may function as a member of a "GPCR family". Therefore, the NOV76 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 

25 therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV76 nucleic acids and proteins of the invention are useful in potential 

30 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV76) may be useful in gene therapy, and the GPCR-like protein (NOV76) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
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from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
5 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV76 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
10 assessed. 

NOV76 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV76 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
1 5 NOVX Antibodies" section below. The disclosed NOV76 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

20 NOV77 

A disclosed NOV77 nucleic acid of 1014 nucleotides (also referred to as CG57419-01) 
encoding a GPCR-like protein is shown in Table 77A. The start and stop codons are in bold 
letters. 



Table 77A. NOV77 nucleotide sequence (SEQ ID NO: 181). 

CATCTTTAGTGTGGTCCTTGGAAGCTCATGGGTCAGGAAAATAAAAACCAGACATGGGTGAGTGAGTTCA 
TTCTGCTGGGGATTTCCAGTGATTGGGGCATTCAGGTATCCCTCTTCGCCCTGATCCTGGCCATGTATTT 
GGTGACTATTTTAGGAAACACCCTCATTCTTCTTCTGATCAGACTGGACAACAGGCTTCATACCCCCATG 
TACTTCTCCCTTAGTGTTCTGTCATTTGTGGACTTTTGTTATACAAAGAGTATTGTCCCACAAATGCTGT 
CCCACTTGCTCTGAGCCCGAAAGTCCATCCCATTCTACAGTTGTGTGCTCCAGCTCTATGTTTCTCTGGC 
ATTGTGTGGGTCTGAGTTCTTCCTGCTGGGGGCCATGGCCTATGACCGCTACGTGGCCGTGTGCCACCCA 
CTGCACTACACGGTCATCATGCATGGAGGGCTGTGCCTGGGGCTGGCGGCCAGCCGCCTGGTGGCTGGCT 
TCTCAAATTCCCTGATGGAAACAATTATCACCTTCCAGCTTTTATCACCTTCCAGCTTCCTGTGTCACGG 
TGTTATCAATCACTTTGTCTGTGAGACCTTAGCAGTGCTACAGCTAGCCTGTGTGGATGTCCCCTTCAAC 
AAGGTCATGGTGGCCATCTCAGGGTTTCTGGTGATCTTGCTTCCCTGTTCCCTGGTTCTATTCTCCTATG 
CTTGCATAGTTGCCACCATTTTGTGCATTCGTTCTACCCAGGTACGCTGCAAAGCCTTTGGGACCTGTGC 
CTCTCACCTCATTGTGGTTTGCATGTGCTTTGGGGCTACCATCTGCACCTACCTGGGGCCACAGTTGGCC 
TCCTCAGCAGAGGAAGAGAAGATGATTGCTCTCTTCTATGGAGTGGTGTCACCCATGTTGAACCCCTTGA 
T C TACAG C TTGAGGAATAAGGAAG TTACGG CTG CTGT C CGG AAAGTTT TAGAAAGATG CAGATAAAGGG T 
CAAGACT CTAAG AAC CT C TTG TTAT C TAT CAT CA 

25 
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In a search of public sequence databases, the NOV77 nucleic acid sequence, located on 
chromsome 7 has 358 of 495 bases (72%) identical to a gb:GENBANK- 
ID:HSU56421|acc:U5642Ll mRNA from Homo sapiens (Human olfactory receptor (OLF3) 
gene, complete cds). Public nucleotide databases include all GenBank databases and the 
5 GeneSeq patent database. 

The disclosed NOV77 polypeptide (SEQ ID NO:182) encoded by SEQ ID NO:181 has 
315 amino acid residues and is presented in Table 77B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV77 has a signal peptide and is 
likely to be localized extracellularly with a certainty of 0.6000. The most likely cleavage site 
10 for a NOV77 peptide is between amino acids 43 and 44. 



Table 77B. Encoded NOV77 protein sequence (SEQ ID NO: 182). 

MGQENKNQT WVS E F I LLG I S SD WG I QVS LFAL I LAMYLVT I LGNTL I LLL I RLDNRLHTP 
MYFSLSVLSFVDFCYTKSIVPQMLSHLLSARKSIPFYSCVLQLYVSLALCGSEFFLLGAM 
AYDRYVAVCHPLHYTVIMHGGLCLGLAASRLVAGFSNSLMETIITFQLLSPSSFLCHGVI 
NHFVCETLAVLQLACVDVPFNKVMVAISGFLVILLPCSLVLFSYACIVATILCIRSTQVR 
CKAFGTCASHL I WCMCFGAT I CT YLGPQLAS S AEEEKM I ALF YGWS PMLNPL I YS LRN 
KEVTAAVRKVLERCR 



A search of sequence databases reveals that the NOV77 amino acid sequence has have 
1 80 of 305 amino acid residues (59%) identical to, and 229 of 305 amino acid residues (75%) 
1 5 similar to, the 317 amino acid residue ptnr.S WISSPROT-ACC:Q95 156 protein from Canis 
familiaris (Dog) (OLFACTORY RECEPTOR-LIKE PROTEIN OLF3). Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV77 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 77C . 

20 



Table 77C. BLAST results for NOV77 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi | 184 80604 |gb| AAL6 
1316. l| (AY073653 


olfactory- 
receptor M0R257-3 
[Mus musculus] 


310 


239/315 
(75%) 


268/315 
(84%) 


e-112 


gi|2495055|sp|Q9515 
6 | OLF3 CAN FA 


OLFACTORY 
RECEPTOR -LIKE 
PROTEIN OLF3 


317 


178/305 
(58%) 


226/305 
(73%) 


E-82 
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qi|6912558|ref |NP 0 


olfactory 


317 


173/305 


221/305 


e-79 


36501. l| 
(NM_012369) 


receptor, family 
2, subfamily F, 
member 1 ; 
olfactory 
receptor, family 
2 , subfamily F, 
member 5; 
ol factory 
receptor, family 
2, subfamily F, 
member 4 [Homo 
sapiens] 




(56%) 


(71%) 




gi| 9297120 |sp|Q1360 


OLFACTORY 


317 


173/305 


221/305 


e-79 


7|02F1 HUMAN 


RECEPTOR 2F1 




(56%) 


(71%) 






(OLFACTORY 












RECEPTOR- LIKE 












PROTEIN OLF3) 










qi| 1847 9500 |gb|AAL6 


olfactory 


313 


176/305 


219/305 


3e-79 


0764 .1 | (AY073101) 


receptor M0R257-1 
[Mus musculus] 




(57%) 


(71%) 





Table 77D lists the domain descriptions from DOMAIN analysis results against 
NOV77. This indicates that the NOV77 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 77D. Domain Analysis of NOV77 

gnl|Pfam|pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 94.1% aligned 
Score = 75.5 bits (184), Expect = 4e-15 



The disclosed NOV77 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 77A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 77A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
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used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 28 percent of the 
bases may be so changed. 

The disclosed NOV77 protein of the invention includes the GPCR-like protein whose 
5 sequence is provided in Table 77B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 77B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 41 
percent of the residues may be so changed. 

1 0 The invention further encompasses antibodies and antibody fragments, such as F ab or 

(Fab)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV77) may function as a member of a "GPCR family". Therefore, the NOV77 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 

1 5 implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

20 and cell types composing (but not limited to) those defined here. 

The NOV77 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV77) may be useful in gene therapy, and the GPCR-like protein (NOV77) may be useful 

25 when administered to a subject in need thereof. By way of nonlimiting example, the 

compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 

30 psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV77 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
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applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV77 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-speciflcally to the novel NOV77 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV77 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV78 

A disclosed NOV78 nucleic acid of 1 151 nucleotides (also referred to as CG5 7425-01) 
encoding a GPCR-like protein is shown in Table 78A. The start and stop codons are in bold 
letters. 



Table 78A. NOV78 nucleotide sequence (SEQ ID NO: 183). 

ggacactggtttgggccatatggatggtgagcaatgtatgacctgattctgttcattaag" 

ataagctttatgtctcctactctaagaaactcttcaattttcatcattctcaccttttgc 

ctttaggttttccgaaggtcaacaa tgaaaaacagaaccatgtttggtgagtttattcta 

ctgggccttacaaatcaacctgaactccaagtgatgatattcatctttctgttcctcacc 

tacatgctaagtgtcctagg7^aatctgactattatcaccctcaccttactagacccccac 

ctccagacccccatgtatttcttcctccggaatttctccttcttagaaatttccttcaca 

tccatttttattcccagatttctgaccagcatgacaacaggaaataaagttatcagcttt 

gctggctgcttgactcagtatttttttgctatatttcttggagctaccgagttttacctc 

ctggcctccatgtcttatgatcgttatgtggccatctgcaaacccttgcattacctgact 

attatgagcagcagagtctgcatacaactagtgttctgctcctggttggggggattccta 

gcaatcttaccaccaatcatcctgatgacccaggtagatttctgtgtctccaacattctg 

aatcactattactgtgactatgggcctctcgtggagcttgcctgctcagacacaagcctc 

ttagaactgatggtcatcctcttggccgttgtgactctcatggttactctggtgctggtg 

acactttcttacacatacattatcaggactattctgaggatcccttccgcccagcaaagg 

acaaaggccttttccacttgttcctcccacatgattgtcatctccctctcttatggcagc 

tgcatgtttatgtacattaatccttctgcaaaagaaggaggtgctttcaacaaaggaata 

gctgtactcattacttcggttactcccttactgaatcccttcatatatactttaagaaat 

cagcaagtgaaacaagctttcaaggactcagtcaaaaagattgtgaaactttaaaaaagg 

agattacacttcaaaatacattttcacttaacaaatatgcattgaatgtctatatttcaa 

gtgctaaattg ~ ~ " _ ~ 



In a search of public sequence databases, the NOV78 nuceic acid sequence, located on 
chromsome 12 has 605 of 931 bases (64%) identical to a gbrGENBANK- 
ID:AF102523|acc:AF102523.1 mRNA from Mus musculus (Mus musculus olfactory receptor 
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C6 gene, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV78 polypeptide (SEQ ID NO: 184) encoded by SEQ ID NO: 183 
has 309 amino acid residues and is presented in Table 78B using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV78 has a signal peptide and 
is likely to be localized at the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV78 peptide is at amino acid 39. 

Table 78B. Encoded NOV78 protein sequence (SEQ ID NO:184). 

MKNRTMFGEFILLGLTNQPELQVMI FI FL 

LRNFS FLE I S FTS I F I PRFLTSMTTGNKVIS FAGCLTQYFFAI FLGATE FYLLASMS YDR 
YVAICKPLHYLTIMSSRVCIQLVFCSWLGGFLAILPPIILMTQVDFCVSNILNHYYCDYG 
PLVEIACSDTSLLELMVILLAWTLMVTLVLVTLSYTYIIRTILRIPSAQQRTKAFSTCS 
SHMIVISLSYGSCMFMYINPSAKEGGAFNKGIAVLITSVTPLLNPFIYTLRNQQVKQAFK 
DSVKKIVKL 



A search of sequence databases reveals that the NOV78 amino acid sequence has 175 
of 309 amino acid residues (56%) identical to, and 222 of 309 amino acid residues (71%) 
similar to, the 3 13 amino acid residue ptnr:SPTREMBL-ACC:Q9Zl V0 protein from Mus 
musculus (Mouse) (OLFACTORY RECEPTOR C6)(. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV78 is expressed in at least Apical microvilli of the retinal pigment epithelium, 
arterial (aortic), basal forebrain, brain, Burkitt lymphoma cell lines, corpus callosum, cardiac 
(atria and ventricle), caudate nucleus, CNS and peripheral tissue, cerebellum, cerebral cortex, 
colon, cortical neurogenic cells, endothelial (coronary artery and umbilical vein) cells, palate 
epithelia, eye, neonatal eye, frontal cortex, fetal hematopoietic cells, heart, hippocampus, 
hypothalamus, leukocytes, liver, fetal liver, lung, lung lymphoma cell lines, fetal lymphoid 
tissue, adult lymphoid tissue, Those that express MHC II and III nervous, medulla, 
subthalamic nucleus, ovary, pancreas, pituitary, placenta, pons, prostate, putamen, serum, 
skeletal muscle, small intestine, smooth muscle (coronary artery in aortic) spinal cord, spleen, 
stomach, taste receptor cells of the tongue, testis, thalamus, and thymus tissue. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV78 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 78C. 
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Table 78C. BLAST results for NOV78 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sitives 
(%) 


Expect 


gi{17474307|ref| XP 


similar to 
olfactory- 
receptor 4 9 [Homo 
sapiens] 


309 


308/309 
(99%) 


309/309 
(99%) 


e-141 


062466 . 1 | 
(XM_062466) 


gi | 18480848 | qb | AAL6 


olfactory 
receptor MOR115-4 
[Mus musculus] 


309 


238/309 
(77%) , 
Positives 
= 273/309 
(88%) 


238/309 
(77%) , 
Positives 
= 273/309 
(88%) 


e-114 


1438. l| (AY073775) 


qi | 18479958 | qb | AAL6 


olfactory 
receptor M0R115-1 
[Mus musculus] 


309 


239/309 
(77%) , 
Positives 
= 274/309 
(88%) 


239/309 
(77%) , 
Positives 
= 274/309 
(88%) 


e-112 


0993. l| (AY073330) 


gi|l7474309|ref | XP 


similar to 
olfactory 
receptor 4 9 [Homo 
sapiens] 


309 


247/309 
(79%) , 
Positives 
= 278/309 
(89%) 


247/309 
(79%) , 
Positives 
= 278/309 
(89%) 


e-112 


062467. 1 | 
(XM__062467) 


gi | 184 7 9614 | gb | AAL6 


olfactory 
receptor M0R114-1 
[Mus musculus] 


312 


207/306 
(67%), 
Positives = 
256/306 
(83%) 


207/306 
(67%), 
Positives = 
256/306 
(83o/o) 


e-110 


0821. l| (AY073158) 



Table 78D lists the domain descriptions from DOMAIN analysis results against 
NOV78. This indicates that the NOV78 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 78D. Domain Analysis of NOV78 

gnl|Pfam|pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 

Score = 90.1 bits (222), Expect = 2e-19 



G-Protein Coupled Receptor (GPCRs) have been identified as an extremely large 
family of protein receptors in a number of species. At the phylogenetic level they can be 
classified into four major subfamilies. These receptors share a seven transmembrane domain 
structure with many neurotransmitter and hormone receptors. They are likely to be involved in 
the recognition and transduction of various signals mediated by G-Proteins, hence their name 
G-Protein Coupled Receptors. The human GPCR genes are generally intron-less and belong to 
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four gene subfamilies, displaying great sequence variability. These genes are dominantly 
expressed in olfactory epithelium. 

Olfactory receptors (ORs) have been identified as extremely large family of GPCRs in 
a number of species. As members of the GPCR family, these receptors share a seven 
5 transmembrane domain structure with many neurotransmitter and hormone receptors, and are 
likely to underlie the recognition and G-protein-mediated transduction of odorant signals. Like 
GPCRs, the ORs they can be expressed in a variety of tissues where they are thought to be 
involved in recognition and transmission of a variety of signals. The human OR genes are 
typically intron-less and belong to four different gene subfamilies, displaying great sequence 
10 variability. These genes are dominantly expressed in olfactory epithelium. 

The disclosed NOV78 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 78A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 78A while still encoding a protein that maintains 

1 5 its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 

20 include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 36 percent of the 

25 bases may be so changed. 

The disclosed NOV78 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 78B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 78B 
while still encoding a protein that maintains its GPCR-like activities and physiological 

30 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 44 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 
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The above defined information for this invention suggests that this GPCR-like protein 
(NOV78) may function as a member of a "GPCR family". Therefore, the NOV78 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV78 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV78) may be useful in gene therapy, and the GPCR-like protein (NOV78) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV78 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV78 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV78 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV78 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 
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NOV79 

A disclosed NOV79 nucleic acid of 1601 nucleotides (also referred to as CG57753-01) 
encoding a GPCR-like protein is shown in Table 79A. The start and stop codons are in bold 
letters. 



Table 79A. NOV79 nucleotide sequence (SEQ ID NO:185). 

CTGGTTCTTTGAGTGAGTTATTCCTGGATTCTAGGAGCTCACAGTAGAGTGTTTCAGAAT 

GGCAAATATCTAAACATTAGCCGGTAATTTTATGCTCCGTATACTGGGTACTAATTTACA 

TAAACATATAAGTAAAGTCTACACATATGAGACTGTTTTCTTGATAGATCATGGAAGGAA 

AAATCCATTCAGGGAAAAAAAAGGGAAATACTATATAAATGTCAAAAATCCAGTCTTITT 

AAGAGACATTCTCTGGAAATATCTCTATTTTGAGGTGTAGTAGATTATCTTACATATATA 

TCCACTCACACATACCTTCCAGTTAGAACACTGAAGCCTCATCATTGTAATTAAAGCAAT 

AAATTTTGTAAAAATGAAAAGGATAATTGTGGGAGGAGATTCTAAACACTCCTTTTCTAA 

TGAGCTGCTCTGTGTCGCCAGGGGAAACATGGTTGAGTAAGGCATCACATTTTTGACATG 

GAGCTTCTGACAAATAATCTCAAATTTATCATTGACCCTTTTGTTTACAGGTTCTGACAC 

CTTAGTCCAATACCTTCAGAAGAACACATGGAAAATAGGAAAAATTGACTTAATTCATCC 

TCTTGGGGCTCACACAGAACCCTGAGGGCCAAAAAGTTTTATTTGTCACATTCTTACTCA 

TCTACATTGTGACGATAA TGGGCAACCTCCTTATCATGGTGACCATCATGGCCA^rrAriT 

CCCTGGGTTCCCCCATGTACTTTTTTCTGGCTTCTTTATCATTTATACATACCGTCTATT 

ATACTGCCATTGCTCCCAAAATGATTGTTGACCTGCTCTCTGAGAAAAAGACCATTTCTT 

TTCAGGGTTGTATGGCTCAACTTTTTATGGATCATTTATTTGCTGGTGCTGAGGTCATTC 

TTCTGGTGGTAATGGCCTATGATCAATATGTGGCCATCTGTAAGCCTCTTCATTATTTGA 

TCATCATGAATCGTCGAGTCTGTGTTCTCATGCTGTTGGTGGCCTGGATTGGAGGCTTTC 

TTCACTCATTGGTTCAATTTCTCTTTATTTATCAGCTCCCTTTCTGTGGACCCAATGTCA 

TTGACAACTTCCTGTGTGATTTGTATCCCTTATTGAAACTTGCTTGCACCAATACCTATG 

TCACTGGGCTTTCTATGATAGCTAATGGTGGAGCGATTTGTACTGTCACCTTCTTCCCTC 

TCCTGCTTTCCTATGGGGTCATATTACCCTCTCTTAAGACTCAGAGTTTGGAAGGGAAAT 

GCAAAGCTTTCTACACCTGTGCATCCCACATCACTGTGATCACTTTATTCTTTGTCCCCT 

GCATCTTCCTGTTTGTAAGGCCCAACTCCACCTTTCCCATTGATAAATCCATGACTGTGG 

TTTTAACTTGTATAACTCCCATGCTGAAACCACTAATCTATGCCCTGAGGAATGCAGAAA 

TGAAAAGTGCCATGAGGAAACTTTGGAGTGAAAAAGTAAGCTTAGCTGGAAAAGGGCTGT 

ATCCCTCATGA GAATATGACTTTCATTCTTTCACAGAAGCAAGGAATAATTTCACTATCC 

TATCAGATTACATTTCTGTTATCATTCGCCTTTAGTTATTT 



In a search of public sequence databases, the NOV79 nucleic acid sequence, located on 
chromsome 12 has 605 of 931 bases (64%) identical to a gb:GENBANK- 
ID:AF102523|acc:AF102523.1 mRNA from Mus musculus (Mus musculus olfactory receptor 
C6 gene, complete cds). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV79 polypeptide (SEQ ID NO: 1 86) encoded by SEQ ID NO: 1 85 has 
277 amino acid residues and is presented in Table 79B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV79 has a signal peptide and is 
likely to be localized at the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV79 peptide is at amino acid position 39. 
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Table 79B. Encoded NOV79 protein sequence (SEQ ID NO: 186). 

MGNLLIIWTIMASQSLGSPMYFFIASLSFIHTVYYTAIAPKMIVDLLSEKKTISFQGCMA 
QLFMDHLFAGAEVILLVVI^YDQYVAICKPLHYLIIMNRRVCVLMLLVAWIGGFLHSLVQ 
FLFIYQLPFCGPNVIDNFLCDLYPLLKLACTNTYVTGLSMIANGGAICTVTFFPLLLSYG 
VILPSLKTQSLEGKCKAFYTCASHITVITLFFVPCIFLFVRPNSTFPIDKSMTWLTCIT 
PMLKPLIYALRNAEMKSAMRKLWSEKVSLAGKGLYPS 



A search of sequence databases reveals that the NOV79 amino acid sequence has 175 
of 309 amino acid residues (56%) identical to, and 222 of 309 amino acid residues (71%) 
similar to, the 313 amino acid residue ptnr:SPTREMBL-ACC:Q9Zl V0 protein from Mus 
musculus (Mouse) (OLFACTORY RECEPTOR C6)(. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

NOV79 is expressed in at least Apical microvilli of the retinal pigment epithelium, 
arterial (aortic), basal forebrain, brain, Burkitt lymphoma cell lines, corpus callosum, cardiac 
(atria and ventricle), caudate nucleus, CNS and peripheral tissue, cerebellum, cerebral cortex, 
colon, cortical neurogenic cells, endothelial (coronary artery and umbilical vein) cells, palate 
epithelia, eye, neonatal eye, frontal cortex, fetal hematopoietic cells, heart, hippocampus, 
hypothalamus, leukocytes, liver, fetal liver, lung, lung lymphoma cell lines, fetal lymphoid 
tissue, adult lymphoid tissue, Those that express MHC II and III nervous, medulla, 
subthalamic nucleus, ovary, pancreas, pituitary, placenta, pons, prostate, putamen, serum, 
skeletal muscle, small intestine, smooth muscle (coronary artery in aortic) spinal cord, spleen, 
stomach, taste receptor cells of the tongue, testis, thalamus, and thymus tissue. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV79 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 79C. 



Table 79C. BLAST results for NOV79 



Gene Index/ 
Identifier 



Protein/ Organism 



Length 
(aa) 



Identity 
(%) 



Positives 
(%) 



Expect 



gi[ 17459952 | ref | XP 

Q6209Q.ll 

(XM_062090) 



similar to 
odorant receptor 
16 [Homo sapiens] 



277 



275/277 
(99%) 



276/277 
(99%) 



e-133 



gi|l746009llref |XP 
062159 . 1 | 
(XM_062159) 



similar to 
odorant receptor 
16 [Homo 
sapiens] 



277 



275/277 
(99%) 



275/277 
(99%) 



e-132 
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cril 17460099|ref |XP 


similar to 
odorant receptor 
16 [Homo sapiens] 


722 


242/266 
(90%) 


251/266 
(93%) 


e-119 


062161. l| 
(XM 062161 




ol factory 
receptor MOR231-2 
[Mus musculus] 


i 14 


226/277 
(81%) 


247/277 
(88%) 


e- 10 9 


0778. l| (AY073115) 


qi | 1847 953 4 | gb | AAL6 


olfactory 
receptor MOR231-3 
[Mus musculus] 


305 


186/265 
(70%) 


223/265 
(83%) 


7e-94 


0781 . 1 | (AY073118) 



Table 79D lists the domain descriptions from DOMAIN analysis results against 
NOV79. This indicates that the NOV79 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 79D. Domain Analysis of NOV79 

gnl|Pfam|pfam00001 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 
Score = 91.3 bits (225), Expect = 7e-20 



G-Protein Coupled Receptor (GPCRs) have been identified as an extremely large 
family of protein receptors in a number of species. At the phylogenetic level they can be 
classified into four major subfamilies. These receptors share a seven transmembrane domain 
structure with many neurotransmitter and hormone receptors. They are likely to be involved in 
the recognition and transduction of various signals mediated by G-Proteins, hence their name 
G-Protein Coupled Receptors. The human GPCR genes are generally intron-less and belong to 
four gene subfamilies, displaying great sequence variability. These genes are dominantly 
expressed in olfactory epithelium. 

Olfactory receptors (ORs) have been identified as extremely large family of GPCRs in 
a number of species. As members of the GPCR family, these receptors share a seven 
transmembrane domain structure with many neurotransmitter and hormone receptors, and are 
likely to underlie the recognition and G-protein-mediated transduction of odorant signals. Like 
GPCRs, the ORs they can be expressed in a variety of tissues where they are thought to be 
involved in recognition and transmission of a variety of signals. The human OR genes are 
typically intron-less and belong to four different gene subfamilies, displaying great sequence 
variability. These genes are dominantly expressed in olfactory epithelium. The disclosed 
NOV79 nucleic acid of the invention encoding a GPCR-like protein includes the nucleic acid 
whose sequence is provided in Table 79A or a fragment thereof. The invention also includes a 
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mutant or variant nucleic acid any of whose bases may be changed from the corresponding 
base shown in Table 79 A while still encoding a protein that maintains its GPCR-like activities 
and physiological functions, or a fragment of such a nucleic acid. The invention further 
includes nucleic acids whose sequences are complementary to those just described, including 
5 nucleic acid fragments that are complementary to any of the nucleic acids just described. The 
invention additionally includes nucleic acids or nucleic acid fragments, or complements 
thereto, whose structures include chemical modifications. Such modifications include, by way 
of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones 
are modified or derivatized. These modifications are carried out at least in part to enhance the 
1 0 chemical stability of the modified nucleic acid, such that they may be used, for example, as 
antisense binding nucleic acids in therapeutic applications in a subject. In the mutant or 
variant nucleic acids, and their complements, up to about 36 percent of the bases may be so 
changed. 

The disclosed NOV79 protein of the invention includes the GPCR-like protein whose 

15 sequence is provided in Table 79B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 79B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 44 
percent of the residues may be so changed. 

20 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV79) may function as a member of a "GPCR family". Therefore, the NOV79 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 

25 implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

30 and cell types composing (but not limited to) those defined here. 

The NOV79 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV79) may be useful in gene therapy, and the GPCR-like protein (NOV79) may be useful 
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when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
5 IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV79 nucleic acid encoding the 
1 0 GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV79 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV79 substances for use in 

1 5 therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV79 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 

20 understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV80 

A disclosed NOV80 nucleic acid of 1006 nucleotides (also referred to as CG56766-01) 
encoding a GPCR-like protein is shown in Table 80A. The start and stop codons are in bold 
25 letters. 



Table 80 A. NOV80 nucleotide sequence (SEQ ID NO: 187). 

CCCTTTCCTCTTGCTCTTTG ATGTTTTGTAGGCCTGCAGCTCCCAAGCACAGAGGCATGAGTGGGGAGAA 
TGTCACCAAGGTCAGCACCTTCATCCTGGTGGGCCTCCCCACGGCCCCAGGGCTGCAGTACCTGCTCTTC 
CTCCTCTTCCTGCTCACCTACCTCTTTGTCCTGGTGGAGAACCTGGCCATCATCCTCATCGTCTGGAGCA 
GCACCTCCCTCCACAGGCCCATGTACTACTTTCTGAGCTCCATGTCTTTCCTGGAGATCTGGTACGTGTC 
TGACATCACCCCCAAGATGCTGGAGGGCTTCCTCCTCCAGCAGAAACGCATCTCTTTCGTCGGGTGCATG 
ACGCAGCTCTACTTCTTCAGCTCCCTGGTGTGCACCGAGTGTGTGCTTCTGGCCTCCATGGCCTACGACC 
GCTACGTGGCCATCTGCCACCCGCTGCGCTACCACGTCCTTGTGACCCCGGGGCTGTGCCTCCAGCTGGT 
GGGCTTCTCCTTTGTGAGTGGCTTCACCATCTCCATGATCAAGGTCTGTTTTATCTCCAGCGTCACGTTC 
TGTGGCTCCAACGTCTTGAACCACTTCTTCTGTGACATTTCCCCCATCCTCAAGCTGGCCTGCACGGACT 
TCTCCACTGCAGAGCTGGTGGATTTCATCCTGGCCTTCATCATCCTGGTGTTTCCGCTCCTGGCCACCAT 
ACTGTCATATTGGCACATCACCCTGGCTGTCCTGCGCATCCCCTCGGCCACCGGCTGCTGGAGAGCCTTC 
TCTACCTGCGCCTCTCACCTCACCGTGGTCACCGTCTTCTATACAGCCTTGCTTTTCATGTATGTCCGGC 
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CCCAAGCCATTGATTCCCAGAGCTCCAACAAGCTCATCTCTGCCGTGTACACTGTTGTCACGCCAATAAT 
TAACCCTTTGATTTACTGCCTGAGGAACAAGGAATTTAAGGACGCCTTGAAAAAGGCCTTGGGCTTGGGT 
CAAACTTCACACTAAGACAACTAAAT 



The disclosed NOV80 polypeptide (SEQ ID NO: 188) encoded by SEQ ID NO: 187 has 
324 amino acid residues and is presented in Table 80B using the one-letter amino acid code. 



Table 80B. Encoded NOV80 protein sequence (SEQ ID NO:188). 



MFCRPAAPKHRGMSGENVTKVSTFILVGLPTAPGLQYLLFLLFLLTYLFVLVENLAIILIVWSSTSLHRP 

MYYFLSSMSFLEIWYVSDITPKMLEGFLLQQKRISFVGCMTQLYFFSSLVCTECVLIASMAYDRYVAICH 

PLRYHVLVTPGLCLQLVGFSFVSGFTISMIKVCFISSVTFCGSNVLNHFFCDISPILKLACTDFSTAELV 

DFIIAFIILVFPLLATILSYWHITLAVLRIPSA 

SSNKLISAVYTWTPI INPLI YCLRNKEFKDALKKALGLGQTSH 



A search of sequence databases reveals that the NOV80 amino acid sequence has 
215/305 (70%) identity and 253/305 (82%) similarity with TREMBLNEW-ACC:AAG45 1 89 
M5 1 OLFACTORY RECEPTOR - Mus musculus. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV80 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 80C. 



Table 80C. BLAST results for NOV80 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 17435888 | ref |XP 


similar to 
olfactory 
receptor 41 [Homo 
sapiens] 


312 


312/312 
(100%) 


312/312 
(100%) 


e-142 


065377 . 1 | 
(XM_065377 


gi|l7435880|ref |XP 


similar to 
olfactory 
receptor 41 [Homo 
sapiens] 


331 


293/307 
(95%) 


300/307 
(97%) 


e-132 


065375. 1| 
(XM_065375) 


gi | 18479396 | gb | AAL6 


olfactory 
receptor MOR103-2 
[Mus musculus] 


312 


271/309 
(87%) 


289/309 
(92%) 


e-123 


0712 . 1 | (AY073049) 


gi| 184 793 98 |gb|AAL6 


olfactory 
receptor MOR103-3 
[Mus musculus] 


312 


268/310 
(86%) 


288/310 
(92%) 


e-123 


0713. l| (AY073050) 


gi| 12007416|gb|AAG4 


m51 olfactory 
receptor [Mus 
musculus 


314 


215/305 
(70%) 


253/305 
(82%) 


e-102 


5189. l| (AF321234) 



Table 80D lists the domain descriptions from DOMAIN analysis results against 
NOV80. This indicates that the NOV80 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 80D. Domain Analysis of NOV80 

gnllPfam|pfam00001 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 1 00.0% aligned 

Score = 98.6 bits (244), Expect = 5e-22 



The disclosed NOV80 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 80A or a fragment thereof. The 
5 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 80 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
10 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
15 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV80 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 80B. The invention also includes a mutant or variant protein 
20 any of whose residues may be changed from the corresponding residue shown in Table 80B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 30 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
25 (FabX ^at bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 

(NOV80) may function as a member of a "GPCR family". Therefore, the NOV80 nucleic 

acids and proteins identified here may be useful in potential therapeutic applications 

implicated in (but not limited to) various pathologies and disorders as indicated below. The 

30 potential therapeutic applications for this invention include, but are not limited to: protein 
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therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV80 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV80) may be useful in gene therapy, and the GPCR-like protein (NOV80) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV80 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV80 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV80 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV80 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV81 includes two GPCR-like proteins disclosed below. The disclosed sequences 
have been named NOV81a and NOV81b. 



NOV81 
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NOV81a 



A disclosed NOV81a nucleic acid of 1039 nucleotides (also referred to as CG57847- 
01) encoding a GPCR-like protein is shown in Table 81 A. The start and stop codons are in 
bold letters. 



Table 81A. NOV81a nucleotide sequence (SEQ ID NO:189). 



GAATGATGCCCTTTTGCCACAATATAATTAATATTTCCTGTGTGAAAAACAACTGGTCAAATGATGTCCG 
TGCTTCCCTGTACAGTTTAATGGTGCTCATAATTCTGACCACACTCGTTGGCAATCTGATAGTTATTGTT 
TCTATATCACACTTCAAACAACTTCATACCCCAACAAATTGGCTCATTCATTCCATGGCCACTGTGGACT 
T T C T T CTGGGG TGT C TGGT CATG C CT TACAGTATGG TGAG AT C TG CTGAG CACTG TTGGTAT TT TGGAG A 
AGTCTTCTGTAAAATTCACACAAGCACCGACATTATGCTGAGCTCAGCCTCCATTTTCCATTTGTCTTTC 
ATCTCCATTGACCGCTACTATGCTGTGTGTGATCCACTGAGATATAAAGCCAAGATGAATATCTTGGTTA 
TTTGTGTGATGATCTTCATTAGTTGGAGTGTCCCTGCTGTTTTTGCATTTGGAATGATCTTTCTGGAGCT 
AAACTTCAAAGGCGCTGAAGAGATATATTACAAACATGTTCACTGCAGAGGAGGTTGCTCTGTCTTCTTT 
AGCAAAATATCTGGGGTACTGACCTTTATGACTTCTTTTTATATACCTGGATCTATTATGTTATGTGTCT 
ATTACAGAATATATCTTATCGCTAAAGAACAGGCAAGATTAATTAGTGATGCCAATCAGAAGCTCCAAAT 
TGGATTGGAAATGAAAAATGGAATTTCACAAAGCAAAGAAAGGAAAGCTGTGAAGACATTGGGGATTGTG 
ATGGGAGTTTTCCTAATATGCTGGTGCCCTTTCTTTATCTGTACAGTCATGGACCCTTTTCTTCACTACA 
TTATTCCACCTACTTTGAATGATGTATTGATTTGGTTTGGCTACTTGAACTCTACATTTAATCCAATGGT 
TTATGCATTTTTCTATCCTTGGTTTAGAAAAGCACTGAAGATGATGCTGTTTGGTAAAATTTTCCAAAAA 
GATTCATCCAGGTGTAAATTATTTTTGGAATTGAGTTCATAGAATTATTATATTTTACT 



The disclosed NOV81a polypeptide (SEQ ID NO:190) encoded by SEQ ID NO:189 
has 339 amino acid residues and is presented in Table 8 IB using the one-letter amino acid 
code. 



Table 81B. Encoded NOV81a protein sequence (SEQ ID NO:190). 



MM P F CHN 1 1 N I S C VKNNWS ND VRAS L YS LMVL 1 1 LTT L VGNL IVIVSISHF KQLHT P TNW L I HS MAT VD F 
LLGCLVMPYSMVRSAEHCWYFGEVFCKIHTSTDIMLSSASIFHLSFISIDRYYAVCDPLRYKAKMNILVI 
CVMIFISWSVPAVFAFGMIFLELNFKGAEEIYYKHVHCRGGCSVFFSKISGVLTFMTSFYIPGSIMLCVY 
YRIYLIAKEQARLISDANQKLQIGLEMKNGISQSKERKAVKTLGIVMGVFLICWCPFFICTVMDPFLHYI 
IPPTLNDVLIWFGYLNSTFNPMVYAFFYPWFRKALKMMLFGKIFQKDSSRCKLFLELSS 



A search of sequence databases reveals that the NOV81a amino acid sequence has 
152/299 (50%) identity and 206/299 (68%) similarity to SPTREMBL-ACC:Q9P1P4 G 
PROTEIN-COUPLED RECEPTOR 57 - Homo sapiens. Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 



NOV81b 

A disclosed NOV81b nucleic acid of 1039 nucleotides (also referred to as CG57847- 
02) encoding a GPCR-like protein is shown in Table 81C. The start and stop codons are in 
bold letters. 
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Table 81C. NOV81b nucleotide sequence (SEQ ID NO:191). 

GAATGATGCCCTTTTGCCACAATATAATTAA.TATTTCCTGTGTGAAAAACAACTGGTCAA 
ATGATGTCCGTGCTTCCCTGTACAGTTTAATGGTGCTCATAATTCTGACCACACTCGTTG 
GCAATCTGATAGTTATTGTTTCTATATCACACTTCAAACAACTTCATACCCCAACAAATT 
GGCTCATTCATTCCATGGCCACTGTGGACTTTCTTCTGGGGTGTCTGGTCATGCCTTACA 
GTATGGTGAGATCTGCTGAGCACTGTTGGTATTTTGGAGAAGTCTTCTGTAAAATTCACA 
CAAGCACCGACATTATGCTGAGCTCAGCCTCCATTTTCCATTTGTCTTTCATCTCCATTG 
ACCGCTACTATGCTGTGTGTGATCCACTGAGATATAAAGCCAAGATGAATATCTTGGTTA 
TTTGTGTGATGATCTTCATTAGTTGGAGTGTCCCTGCTGTTTTTGCATTTGGAATGATCT 
TTCTGGAGCTAAACTTCAAAGGCGCTGAAGAGATATATTACAAACATGTTCACTGCAGAG 
GAGGTTGCTCTGTCTTCTTTAGCAAAATATCTGGGGTACTGACCTTTATGACTTCTTTTT 
ATATACCTGGATCTATTATGTTATGTGTCTATTACAGAATATATCTTATCGCTAAAGAAC 
AGGCAAGATTAATTAGTGATGCCAATCAGAAGCTCCAAATTGGATTGGAAATGAAAAATG 
GAATTTCACAAAGCAAAGAAAGGAAAGCTGTGAAGACATTGGGGATTGTGATGGGAGTTT 
TCCTAATATGCTGGTGCCCTTTCTTTATCTGTACAGTCATGGACCCTTTTCTTCACTACA 
TTATTCCACCTACTTTGAATGATGTATTGATTTGGTTTGGCTACTTGAACTCTACATTTA 
ATCCAATGGTTTATGCATTTTTCTATCCTTGGTTTAGAAAAGCACTGAAGATGATGCTGT 
TTGGTAAAATTTTCCAAAAAGATTCATCCAGGTGTAAATTATTTTTGGAATTGAGTTCAT 
AGAATTATTATATTTTACT 



In a search of public sequence databases, the NOV81b nucleic acid sequence, located 
on chromsome 6 has 616 of 979 bases (62%) identical to a gb:GENBANK- 
ID:HSU88828|acc:U88828.1 mRNA from Homo sapiens (Homo sapiens serotonin-4-receptor- 
like pseudogene). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV81b polypeptide (SEQ ID NO: 192) encoded by SEQ ID NO: 191 
has 339 amino acid residues and is presented in Table B using the one-letter amino acid code. 
Signal P, Psort and/or Hydropathy results predict that NOV81b has a signal peptide and is 
likely to be localized in the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV81b peptide is between amino acids 47 and 48. 



Table 81D. Encoded NOV81b protein sequence (SEQ ID NO: 192). 

MMPFCHNIINISCVKNNW 

LIHSMATVDFLLGCLVMPYSMVRSAEHCWYFGEVFCKIHTSTDIMLSSASIFHLSFISID 
R YYAVCDPLR YKAKMNI LVI CVM I FI S WS VP AVFAFGMI FLELNFKGAEE I YYKHVHCRG 
GCS VFFSKI SGVLTFMTS F YI PGS IMLCVYYR I YL I AKEQARL I SDANQKLQ IGLEMKNG 
I S QS KERKAVKTLG I VMGVFL I CWCPFF I CTVMDPFLHYI I PPTLNDVL I WFG YLNS TFN 
PMVYAFFYPWFRKALKMMLFGKIFQKPSSRCKLFLELSS 



A search of sequence databases reveals that the NOV8 1 b amino acid sequence has 1 52 
of 299 amino acid residues (50%) identical to, and 206 of 299 amino acid residues (68%) 
similar to, the 343 amino acid residue ptnr:SPTREMBL-ACC:Q9PlP4 protein from Homo 
sapiens (Human) (G PROTEIN-COUPLED RECEPTOR 57). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 
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NOV81b is expressed in at least Apical microvilli of the retinal pigment epithelium, 
arterial (aortic), basal forebrain, brain, Burkitt lymphoma cell lines, corpus callosum, cardiac 
(atria and ventricle), caudate nucleus, CNS and peripheral tissue, cerebellum, cerebral cortex, 
colon, cortical neurogenic cells, endothelial (coronary artery and umbilical vein) cells, palate 
5 epithelia, eye, neonatal eye, frontal cortex, fetal hematopoietic cells, heart, hippocampus, 
hypothalamus, leukocytes, liver, fetal liver, lung, lung lymphoma cell lines, fetal lymphoid 
tissue, adult lymphoid tissue, Those that express MHC II and III nervous, medulla, 
subthalamic nucleus, ovary, pancreas, pituitary, placenta, pons, prostate, putamen, serum, 
skeletal muscle, small intestine, smooth muscle (coronary artery in aortic) spinal cord, spleen, 

10 stomach, taste receptor cells of the tongue, testis, thalamus, and thymus tissue. This 

information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV81b polypeptide has homology to the amino acid sequences shown 

1 5 in the BLASTP data listed in Table 8 1 E. 



Table 81E. BLAST results for NOV81b 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sit ives 
(%) 


Expect 


gi | 10441577 | gb | AAG1 
7112 .l|AF200627 1 
(AF200627) 


putative 
catecholamine 
receptor [Homo 
sapiens] 


339 


339/339 
(100%) 


339/339 
(100%) 


e-177 


qi | 17453976 | ref | XP 
069048 . 1 | 
(XM__069048) 


similar to trace 
amine receptor 1 
[Homo sapiens] 


338 


338/338 
(100%) 


338/338 
(100%) 


e-176 


gi | 14600076 | gb | AAK7 
1237 .l|AF380186 1 
(AF380186) 


trace amine 
receptor 1 
[Rattus 
norvegicus] 


332 


261/334 
(78%) , 
Positives 
= 288/334 
(86%) , 


261/334 
(78%) , 
Positives 
= 288/334 
(86%) , 


e-136 


qi j 18182341 1 gb | AAL6 
5137 . 1 |AF421352 1 
(AF421352) 


trace amine 
receptor 1 
[Rattus 
norvegicus] 


332 


261/334 
(78%) , 
Positives 
= 287/334 
(85%) , 


261/334 
(78%) , 
Positives 
= 287/334 
(85%) , 


e-136 


qi | 16716513 | ref | NP 
444435 . 1 | 
(NM_053205) 


trace amine 
receptor 1 [Mus 
musculus] 


332 


252/334 
(75%) , 
Positives 
= 283/334 
(84%) , 


252/334 
(75%) , 
Positives 
= 283/334 
(84%) , 


e-133 



Table 8 IF lists the domain descriptions from DOMAIN analysis results against 
NOV81b. This indicates that the NOV81b sequence has properties similar to those of other 
20 proteins known to contain this domain. 
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Table 81F. Domain Analysis of NOV 81b 

gnllPfamlpfamOOOOl , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 100.0% aligned 

Score = 142 bits (358), Expect = 3e-35 

The disclosed NOV81b nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 81 A or a fragment thereof. The 
5 invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 81 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 

10 just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 

15 used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV81b protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 8 IB. The invention also includes a mutant or variant protein 

20 any of whose residues may be changed from the corresponding residue shown in Table 81 B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 50 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 

25 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV81b) may function as a member of a "GPCR family". Therefore, the NOV81b nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

30 potential therapeutic applications for this invention include, but are not limited to: protein 
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therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 
5 The NOV81b nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV81b) may be useful in gene therapy, and the GPCR-like protein (NOV81b) may be 
useful when administered to a subject in need thereof. By way of nonlimiting example, the 

1 0 compositions of the present invention will have efficacy for treatment of patients suffering 

from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 

15 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV81b nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 

20 assessed. 

NOV81b nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV81b substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
25 NOVX Antibodies" section below. The disclosed NOV81b proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

30 NOV82 

A disclosed NOV82 nucleic acid of 1033 nucleotides (also referred to as CG57845-01) 
encoding a GPCR-like protein is shown in Table 82A. The start and stop codons are in bold 
letters. 

444 



vn* n & as 

n-ll*. r UL & -i^ - ~ 



* iLvUA*. ^Laus .^aii^^ "rlie^J' I?2iii4i 



Table 82A. NOV82 nucleotide sequence (SEQ ID NO:193). 

AACCA TGACCAGCAATTTTTCCCAACCTGTTGTGCAGCTTTGCTATGAGGATGTGAATGGATCTTGTATT 
GAAACTCCCTATTCTCCTGGGTCCCGGGTAATTCTGTACACGGCGTTTAGCTTTGGGTCTTTGCTGGCTG 
TATTTGGAAATCTCTTAGTAATGACTTCTGTTCTTCATTTTAAGCAGCTGCACTCTCCAACCAATTTTCT 
CATTGCCTCTCTGGCCTGTGCTGACTTCTTGGTAGGTGTGACTGTGATGCTTTTCAGCATGGTCAGGACG 
GTGGAGAGCTGCTGGTATTTTGGAGCCAAATTTTGTACTCTTCACAGTTGCTGTGATGTGGCATTTTGTT 
ACTCTTCTGTCCTCCACTTGTGCTTCATCTGCATCGACAGGTACATTGTGGTTACTGATCCCCTGGTCTA 
TGCTACCAAGTTCACCGTGTCTGTGTCGGGAATTTGCATCAGCGTGTCCTGGATTCTGCCTCTCACGTAC 
AGCGGTGCTGTGTTCTACACAGGTGTCAATGATGATGGGCTGGAGGAATTAGTAAGTGCTCTCAACTGCG 
TAGGTGGCTGTCAAATTATTGTAAGTCAAGGCTGGGTGTTGATAGATTTTCTGTTATTCTTCATACCTAC 
CC TTGTTATGATAATT CTT TACAGTAAG AT TT TT CTTATAG CTAAACAACAAG CT ATAAAAATTGAAACT 
ACTAGTAGCAAAGTAGAATCAT C CTCAGAGAGTTATAAAATCAGAGTGGCCAAGAGAGAGAGGAAAGCAG 
CTAAAACCCTGGGGGTCACGGTACTAGCATTTGTTATTTCATGGTTACCGTATACAGTTGATATATTAAT 
TGATGCCTTTATGGGCTTCCTGACCCCTGCCTATATCTATGAAATTTGCTGTTGGAGTGCTTATTATAAC 
TCAGCCATGT^ATCCTTTGATTTATGCTCTATTTTATCCTTGGTTTAGGAAAGCCATAAAACTTATTTTAA 
GTGGAGATGTTTTAAAGGCTAGTTCATCAACCATTAGTTTATTTTTAGAATAA 



The disclosed NOV82 polypeptide (SEQ ID NO: 194) encoded by SEQ ID NO: 193 has 
342 amino acid residues and is presented in Table 82B using the one-letter amino acid code. 



Table 82B. Encoded NOV82 protein sequence (SEQ ID NO:194). 

MTSNFSQPVVQLCYEDVNGSCIETPYSPGSRVILYTAFSFGSLLAVFGNLLVMTSVLHFKQLHSPTNFLI 
ASIxACADFLVGVTVMLFSMVRTVESCWYFGAKFCTLHSCCDVAFCYSSVLHLCFICIDRYIVVTDPLVYA 
TKFTVS VSG I C I S VS W I LP LT YSGAVF YTGVNDDGLEELVS ALNCVGGCQ 1 1 VSQGWVL I DFLLFF I PTL 
VM IILYSKIFL I AKQQA I K I E TTS S KVE SSSESYKI R VAKRER KAAKT LG VT VLAF V I S WLP YT VD I L I D 
AFMGFLTPAYIYEICCWSAYYNSAMNPLIYALFYPWFRKAIKLILSGDVLKASSSTISLFLE 



A search of sequence databases reveals that the NOV82 amino acid sequence has 
145/330 (43%) identity and 214/330 (64%) similarity with SPTREMBL-ACC:0 14804 
PUTATIVE NEUROTRANSMITTER RECEPTOR - Homo sapiens. Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV82 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 82C. 



Table 82C. BLAST results for NOV82 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi|l6751917|ref |NP 
444508 . 1 | 
<NM__053278) 


G protein-coupled 
receptor 102; 
trace amine 
receptor 5 [Homo 
sapiens] 


342 


342/342 
(100%) 


342/342 
(100%) 


e-166 


qi | 14 6 00102 | gb | AAK7 
1250.1|AF380199 1 
(AF380199 


similar to trace 
amine receptor 4 
(H. sapiens) 
[Homo sapiens] 


374 


274/344 
(79%) 


308/344 
(88%) 


e-143 
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qi[l7453968lref |XP 
069046 . 1 1 
(XM 069046 



Similar to trace 
amine receptor 4 
[Rattus 

norvegicus] 



345 



273/343 
(79%) 



303/343 
(87%) 



e-142 



qi | 14600086 1 qb 1 AAK7 
1242 . 1 | A F380191 1 
(AF380191) 



trace amine 
receptor 4 
[Rattus 
norvegicus] 



345 



266/343 
(77%) 



302/343 
(87%) 



e-142 



gi 1 14 6 0 0100 | gb | AAK7 
1249. 1 |AF380198 1 
(AF380198 



trace amine 
receptor 10 
[Rattus 
norvegicus] 



344 



274/344 
(79%) 



305/344 
(88%) 



e-141 



Table 82D lists the domain descriptions from DOMAIN analysis results against 
NOV82. This indicates that the NOV82 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 82D. Domain Analysis of NOV82 

gnl|Pfam|pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 1 00.0% aligned 

Score = 132 bits (331), Expect = 4e-32 



The disclosed NOV82 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 82A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 82A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV82 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 82B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 82B 
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while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 57 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV82) may function as a member of a "GPCR family". Therefore, the NOV82 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

10 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

15 The NOV82 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV82) may be useful in gene therapy, and the GPCR-like protein (NOV82) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 

20 compositions of the present invention will have efficacy for treatment of patients suffering 

from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 

25 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV82 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 

30 assessed. 

NOV82 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV82 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
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NOVX Antibodies" section below. The disclosed NOV82 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV83 

A disclosed NOV83 nucleic acid of 1045 nucleotides (also referred to as CG57843-01) 
encoding a GPCR-like protein is shown in Table 83A. The start and stop codons are in bold 
letters. 



Table 83A. NOV83 nucleotide sequence (SEQ ID NO:195). 

CGTTA TGAGCAGCAATTCATCCCTGCTGGTGGCTGTGCAGCTGTGCTACGCGAACGTGAATGGGTCCTGT 
GTGAAAATCCCCTTCTCGCCGGGATCCCGGGTGATTCTGTACATAGTGTTTGGCTTTGGGGCTGTGCTGG 
CTGTGTTTGGAAACCTCCTGGTGATGATTTCAATCCTCCATTTCAAGCAGCTGCACTCTCCGACCAATTT 
TCTCGTTGCCTCTCTGGCCTGCGCTGATTTCTTGGTGGGTGTGACTGTGATGCCCTTCAGCATGGTCAGG 
ACGGTGGAGAGCTGCTGGTATTTTGGGAGGAGTTTTTGTACTTTCCACACCTGCTGTGATGTGGCATTTT 
GTTACTCTTCTCTCTTTCACTTGTGCTTCATCTCCATCGACAGGTACATTGCGGTTACTGACCCCCTGGT 
CTATCCTACCAAGTTCACCGTATCTGTGTCAGGAATTTGCATCAGCGTGTCCTGGATCCTGCCCCTCATG 
TACAGCGGTGCTGTGTTCTACACAGGTGTCTATGACGATGGGCTGGAGGAATTATCTGATGCCCTAAACT 
GTATAGGAGGTTGTCAGACCGTTGTAAATCAAAACTGGGTGTTGACAGATTTTCTATCCTTCTTTATACC 
TAC C TT TAT TATGATAATT CTG TATGG TAACATAT TT C TTGTGG CTAGACGACAGG CG AAAAAGATAGAA 
AATAC TGGTAG CAAGACAGAAT CAT C C T CAGAGAG T TACAAAG C CAGAGTGG C CAGG AGAG AGAGAAAAG 
C AG CTAAAAC C C TGGGGG T CACAG TGGT AG CAT TTATGATTTC ATGGT TAC CAT AT AG CATTG AT T CAT T 
AATTGATGCCTTTATGGGCTTTATAACCCCTGCCTGTATTTATGAGATTTGCTGTTGGTGTGCTTATTAT 
AAC T CAG C CATGAAT C CTT TG AT TTATG C T TTAT T TTACC CATGGT T TAGGAAAG CAATAAAAG T TAT TG 
TAACTGGT CAGG TTTTAAAG AAC AG TT GAG CAAC CATGAAT TTG TTTTC TGAACATATAT AAGCA 



The disclosed NOV83 polypeptide (SEQ ID NO: 196) encoded by SEQ ID NO: 195 
has 345 amino acid residues and is presented in Table 83B using the one-letter amino acid 
code. 



Table 83B. Encoded NOV83 protein sequence (SEQ ID NO:196). 

MSSNSSLLVAVQLCYANVNGSCVKIPFSPGSRVILYIVFGFGAVLAVFGNLLVMISILHFKQLHSPTNFL 
VASLACADFLVGVT\^PFSMVRTVESCWYFGRSFCTFHTCCDVAFCYSSLFHLCFISIDRYIAVTDPLVY 
PTKFTVSVSGICISVSWILPLMYSGAVFYTGVYDDGLEELSDALNCIGGCQTWNQNWVLTDFLSFFIPT 
F I M 1 1 L YGN I FLVARRQAKKI ENTGS KTES S S ES Y KARVARRERKAAKTLGVT WAFM I S WLP YS I DS L I 
DAFMGFITPACIYEICCWCAYYNSAMNPLIYALFYPWFRKAIKVIVTGQVLKNSSATMNLFSEHI 



A search of sequence databases reveals that the NOV83 amino acid sequence has 
146/330 (44%) identity and 216/330 (65%) similarity with SPTREMBL-ACC:O14804 
PUTATIVE NEUROTRANSMITTER RECEPTOR - Homo sapiens. Public amino acid 
databases include the GenBank databases, SwissProt, PDB and PIR. 
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The disclosed NOV83 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 83C. 



Table 83C. BLAST results for NOV83 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


qi | 17453968 | ref | XP 
069046. 1| 
(XM_069046) 


similar to trace 
amine receptor 4 
(H. sapiens) 


345 


345/345 
(100%) 


345/345 
(100%) 


e-180 


gi | 14600086 | gb | AAK7 
1242 . 1 [AF380191 1 
^ f\r j o u l ^ i ) 


trace amine 
receptor 4 

[KdCCUS 

norvegicus] 


345 


302/345 
(87%) 


321/345 
(92%) 


e-162 


gi | 146 00102 | gb | AAK7 
1250 . 1 | AF380199 1 
(AF380199) 


trace amine 
receptor 11 
[Rattus 
norvegicus] 


373 


265/345 
(76%) 


306/345 
(87%) 


e-141 


gi | 16751917[ref [NP 
444508 .1| 
(NM_053278) 


G protein- coupled 
receptor 102; 
trace amine 
receptor 5 [Homo 
sapiens] 


342 


273/343 
(79%) 


303/343 
(87%) 


e-137 


gi | 14 6 000 94 | gb | AAK7 
1246.1|AF380195 1 
(AF380195) 


trace amine 
receptor 7 
[Rattus 
norvegicus] 


344 


258/345 
(74%) 


302/345 
(86%) 


e-137 



5 Table 83D lists the domain descriptions from DOMAIN analysis results against 

NOV83. This indicates that the NOV83 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 83D. Domain Analysis of NOV83 

gnl|Pfamjpfam00Q0L 7tm 1, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 1 00.0% aligned 

Score = 130 bits (326) , Expect = 2e-31 

10 The disclosed NOV83 nucleic acid of the invention encoding a GPCR-like protein 

includes the nucleic acid whose sequence is provided in Table 83 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 83 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 

15 invention further includes nucleic acids whose sequences are complementary to those just 

described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
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w 

complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV83 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 83B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 83B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof In the mutant or variant protein, up to about 56 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV83) may function as a member of a "GPCR family". Therefore, the NOV83 nucleic 
acids and proteins identified here may be useful in potential' therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV83 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV83) may be useful in gene therapy, and the GPCR-like protein (NOV83) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
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diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV83 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
5 applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV83 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV83 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 

10 known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 

NOVX Antibodies" section below. The disclosed NOV83 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 

15 disorders. 



NOV84 

NOV84 includes two GPCR-like proteins disclosed below. The disclosed sequences 
have been named NOV84a and NOV84b. 



NOV84a 

20 A disclosed NOV84a nucleic acid of 948 nucleotides (also referred to as CG5784 1-01) 

encoding a GPCR-like protein is shown in Table 84A. The start and stop codons are in bold 
letters. 



Table 84 A. NOV84a nucleotide sequence (SEQ ID NO: 197), 

CCTTTGTAAA TGGCCTTGGGGAATCACAGCACCATCACCGAGTTCCTCCTCCTTGGGCTGTCTGCCGACC 
CCAACATCCGGGCTCTGCTCTTTGTGCTGTTCCTGGGGATTTACCTCCTGACCATAATGGAAAACCTGAT 
GCTGCTGCTCATGATCAGGGCTGATTCTTGTCTCCATAAGCCCATGTATTTCTTCCTGAGTCACCTCTCT 
TTTGTTGATCTCTGCTTCTCTTCAGTCATTGTGCCCAAGATGCTGGAGAACCTCCTGTCACAGAGGAAAA 
CCATTTCAGTAGAGGGCTGCCTGGCTCAGGTCTTCTTTGTGTTTGTCACTGCAGGGACTGAAGCCTGCCT 
TCTCTCAGGGATGGCCTATGACCGCCATGCTGCCATCTGCCGCCCACTACTTTATGGACAGATCATGGGT 
AAACAGCTGTATATGCACCTTGTGTGGGGCTCATGGGGACTGGGCTTTCTGGACGCACTCATCAATGTCC 
TCCTAGCTGTAAACATGGTCTTTTGTGAAGCCAAAATCATTCACCACTACAGCTATGAGATGCCATCCCT 
CCTCCCTCTGTCCTGCTCTGATATCTCCAGAAGCCTCATCGCCTTGCTCTGCTCCACTCTCCTACATGGG 
CTGGGAAACTTCCTTTTGGTCTTCTTATCCTACACCCGTATAATCTCTACCATCCTAAGCATCAGCTCTA 
CCTCGGGCAGAAGCAAGGCCTTCTCCACCTGCTCTGCCCACCTCACTGCAGTGACACTTTACTATGGCTC 
AGGTTTGCTCCGCCATCTCATGCCAAACTCAGGTTCCCCCATAGAGTTGATCTTCTCTGTGCAGTATACT 
GTAGTCACTCCCATGCTGAATTCCCTCATCTATAGCCTGAAAAATAAGGAAGTGAAGGTAGCTCTGAAAA 
GAACTTTGGAAAAATATTTGCAATATACCAGACGTTGA 
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The disclosed NOV84a polypeptide (SEQ ID NO: 1 98) encoded by SEQ ID NO: 1 97 
has 312 amino acid residues and is presented in Table 84B using the one-letter amino acid 
code. 



Table 84B. Encoded NOV84a protein sequence (SEQ ID NO: 198). 



MALGNHSTITEFLLLGLSADPNIRALLFVLFLGIYLLTIMENLMLLLMIRADSCLHKPMYFFLSHLSFVD 
LCFSSVIVPKMLENLLSQRKTISVEGCLAQVFFVFVTAGTEACLLSGMAYDRHAAICRPLLYGQIMGKQL 
YMHLVWGSWGLGFLDALINVLLAVNMVFCEAKIIHHYSYEMPSLLPLSCSDISRSLIALLCSTLLHGLGN 
FLLVFLS YTR 1 1 ST I LS I S STSGRS KAFSTCSAHLTAVTLYYGSGLLRHLMPNSGS P I EL I FSVQYT WT 
PMLNSLIYSLKNKEVKVALKRTLEKYLQYTRR 



A search of sequence databases reveals that the NOV84a amino acid sequence has 
138/309 (44%) identity and 203/309 (65%) similarity with TREMBLNEW-ACC:AAG39860 
ODORANT RECEPTOR K15 - Mus musculus. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 



NOV84b 

A disclosed NOV84b nucleic acid of 1039 nucleotides (also referred to as CG57841- 
02) encoding a GPCR-like protein is shown in Table 84C. The start and stop codons are in 
bold letters. 



Table 84C. NOV84b nucleotide sequence (SEQ ID NO:199). 

GAATGATGCCCTTTTGCCACAATATAATTAATATTTCCTGTGTGAAAAACAACTGGTCAA 
ATGATGTCCGTGCTTCCCTGTACAGTTTAATGGTGCTCATAATTCTGACCACACTCGTTG 
GCAATCTGATAGTTATTGTTTCTATATCACACTTCAAACAACTTCATAC CCCAACAAATT 
GGCTCATTCATTCCATGGCCACTGTGGACTTTCTTCTGGGGTGTCTGGTCATGCCTTACA 
GTATGGTGAGATCTGCTGAGCACTGTTGGTATTTTGGAGAAGTCTTCTGTAAAATTCACA 
CAAGCACCGACATTATGCTGAGCTCAGCCTCCATTTTCCATTTGTCTTTCATCTCCATTG 
ACCGCTACTATGCTGTGTGTGATCCACTGAGATATAAAGCCAAGATGAATATCTTGGTTA 
TTTGTGTGATGATCTTCATTAGTTGGAGTGTCCCTGCTGTTTTTGCATTTGGAATGATCT 
TTCTGGAGCTAAACTTCAAAGGCGCTGAAGAGATATATTACAAACATGTTCACTGCAGAG 
GAGGTTGCTCTGTCTTCTTTAGCAAAATATCTGGGGTACTGACCTTTATGACTTCTTTTT 
ATATACCTGGATCTATTATGTTATGTGTCTATTACAGAATATATCTTATCGCTAAAGAAC 
AGGCAAGATTAATTAGTGATGCCAATCAGAAGCTCCAAATTGGATTGGAAATGAAAAATG 
GAATTTCACAAAGCAAAGAAAGGAAAGCTGTGAAGACATTGGGGATTGTGATGGGAGTTT 
TCCTAATATGCTGGTGCCCTTTCTTTATCTGTACAGTCATGGACCCTTTTCTTCACTACA 
TTATTCCACCTACTTTGAATGATGTATTGATTTGGTTTGGCTACTTGAACTCTACATTTA 
ATCCAATGGTTTATGCATTTTTCTATCCTTGGTTTAGAAAAGCACTGAAGATGATGCTGT 
TTGGTAAAATTTTCCAAAAAGATTCATCCAGGTGTAAATTATTTTTGGAATTGAGTTCAT 
AGAATTATTATATTTTACT 



In a search of public sequence databases, the NOV84b nucleic acid sequence, located 
on chromsome 6 has 616 of 979 bases (62%) identical to a gb:GENBANK- 
ID:HSU88828|acc:U88828.1 mRNA from Homo sapiens (Homo sapiens serotonin-4-receptor- 
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like pseudogene). Public nucleotide databases include all GenBank databases and the 
GeneSeq patent database. 

The disclosed NOV84b polypeptide (SEQ ID NO:200) encoded by SEQ ID NO: 199 
has 339 amino acid residues and is presented in Table 84D using the one-letter amino acid 
code. Signal P, Psort and/or Hydropathy results predict that NOV 84b has a signal peptide and 
is likely to be localized in the plasma membrane with a certainty of 0.6000. The most likely 
cleavage site for a NOV84b peptide is between amino acids 47 and 48. 

Table 84D. Encoded NOV84b protein sequence (SEQ ID NO:200). 

MMPFCHNI INI S CVKNNWSNDVRASLYSLMVLI ILTTLVGNLI VI VS I SHFKQLHTPTNW 
LIHSMATVDFLLGCLVMPYSMVRSAEHCWYFGEVFCKIHTSTDIMLSSASIFHLSFISID 
RYYAVCDPLRYKAKMNILVI CVMI FI S WS VPAVFAFGMI FLELNFKGAEE I YYKHVHCRG 
GCSVFFSKISGVLTFMTSFYIPGSIMLCVYYRIYLIAKEQARLISDANQKLQIGLEMKNG 
I SQSKERKAVKTLG I VMGVFL I CWCPFF I CTVMDPFLHYI I PPTLNDVL I WFG YLNS TFN 
PMVYAFFYPWFRKALKMMLFGKIFQKDSSRCKLFLELSS 

A search of sequence databases reveals that the NOV84b amino acid sequence has 152 
of 299 amino acid residues (50%) identical to, and 206 of 299 amino acid residues (68%) 
similar to, the 343 amino acid residue ptnr:SPTREMBL-ACC:Q9PlP4 protein from Homo 
sapiens (Human) (G PROTEIN-COUPLED RECEPTOR 57). Public amino acid databases 
include the GenBank databases, SwissProt, PDB and PIR. 

NOV84b is expressed in at least Apical microvilli of the retinal pigment epithelium, 
arterial (aortic), basal forebrain, brain, Burkitt lymphoma cell lines, corpus callosum, cardiac 
(atria and ventricle), caudate nucleus, CNS and peripheral tissue, cerebellum, cerebral cortex, 
colon, cortical neurogenic cells, endothelial (coronary artery and umbilical vein) cells, palate 
epithelia, eye, neonatal eye, frontal cortex, fetal hematopoietic cells, heart, hippocampus, 
hypothalamus, leukocytes, liver, fetal liver, lung, lung lymphoma cell lines, fetal lymphoid 
tissue, adult lymphoid tissue, Those that express MHC II and III nervous, medulla, 
subthalamic nucleus, ovary, pancreas, pituitary, placenta, pons, prostate, putamen, serum, 
skeletal muscle, small intestine, smooth muscle (coronary artery in aortic) spinal cord, spleen, 
stomach, taste receptor cells of the tongue, testis, thalamus, and thymus tissue. This 
information was derived by determining the tissue sources of the sequences that were included 
in the invention including but not limited to SeqCalling sources, Public EST sources, 
Literature sources, and/or RACE sources. 

The disclosed NOV84b polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 84E. 
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Table 84E. BLAST results for NOV84b 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l7474603|ref | XP 


similar to 
olfactory 
receptor [Homo 
sapiens] 


312 


312/312 
(100%) 


312/312 
(100%) 


e-145 


062552 .11 
(XM_062552) 


gi | 184794 02 | gb | AAL6 


olfactory- 
receptor MOR160-1 
[Mus musculus] 


309 


216/304 
(71%) 


254/304 
(83%) 


e-105 


0715. 1| (AY073052) 


gi | 18480580 |qb|AAL6 


olfactory- 
receptor MOR160-2 
[Mus musculus] 


308 


212/307 
(69%) 


256/307 
(83%) 


e-95 


1304. l| (AY073641) 


gi| 184 8 0924 | gb | AAL6 


olfactory 
receptor MOR160-5 
[Mus musculus] 


311 


211/309 
(68%) 


258/309 
(83%) 


e-94 


1476. 1| (AY073813) 


gi | 1848 0 922 | gb | AAL6 


olfactory 
receptor MOR160-4 
[Mus musculus] 


305 


184/302 
(60%) 


233/302 
(76%) 


6e-84 


1475. 1| (AY073812) 



Table 84F lists the domain descriptions from DOMAIN analysis results against 
NOV84b. This indicates that the NOV84b sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 84F. Domain Analysis of NOV84b 

gnl jPfamlpfamOOOO 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 44.9% aligned 

Score = 68.6 bits (166), Expect = 5e-13 



The disclosed NOV84b nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 84A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 84A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
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used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 38 percent of the 
bases may be so changed. 

The disclosed NOV84b protein of the invention includes the GPCR-like protein whose 
5 sequence is provided in Table B. The invention also includes a mutant or variant protein any 
of whose residues may be changed from the corresponding residue shown in Table 84B while 
still encoding a protein that maintains its GPCR-like activities and physiological functions, or 
a functional fragment thereof. In the mutant or variant protein, up to about 50 percent of the 
residues may be so changed. 

1 0 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(Fab)2 3 that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV84b) may function as a member of a "GPCR family". Therefore, the NOV84b nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 

1 5 implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

20 and cell types composing (but not limited to) those defined here. 

The NOV84b nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV84b) may be useful in gene therapy, and the GPCR-like protein (NOV84b) may be 

25 useful when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD 5 allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 

30 psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV84b nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
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applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV84b nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV84b substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV84b proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV85 

A disclosed NOV85 nucleic acid of 963 nucleotides (also referred to as CG57839-01) 
encoding a GPCR-like protein is shown in Table 85A. The start and stop codons are in bold 
letters. 



Table 85A. NOV85 nucleotide sequence (SEQ ID NO:201). 

AAC ATGGAAAGCAATCAGACCTGGATCACAGAAGTCATCCTGTTGGGATTCCAGGTTGGACCAGCTCTGG 
AGTTGTTCCTCTTTGGGTTTTTCTTGCTATTCTACAGCTTAACCCTGATGGGAAATTTGGACTCTAGACT 
GCACACACCCATGTATGTCTTCCTGTCACATCTGGCCATTGTGGACATGTCCTATGCCTCGAGTACTGTC 
C CTAAGATG CTAG CAAAT C TTG TGATG CACAAAAAAGT CAT CT CC TT TG CT C CTTG CATACTT CAG AC TT 
TTTTGTATTTGGCGTTTGCTATTACAGAGTGTCTGATTTTGGTGATGATGTGCTATGATCGGTATGTGGC 
AATCTGTCACCCCTTGCAATACACCCTCATTATGAACTGGAGAGTGTGCACTGTCCTGGCCTCAACTTGC 
TGGATATTTAGCTTTCTCTTGGCTCTGGTCCATATTACTCTTATTCTGAGGCTGCCTTTTTGTGGCCACA 
AAAGATCAACCACTTTTTTTTTGTGGCCACAAAAGATCAACCACTTTTTCTGTCAAATCATGTCCGTATT 
CAAATTGGCCTGTGCTGACACTAGGCTCAACCAGGTGGTCCTATTTGCGGGTTCTGCGTTCATCTTAGTG 
GGGCCGCTCTGCCTGGTGCTGGTCTCCTACTTGCACATCCTGGTGGCCATCTTGAGGATCCAGTCTGGGG 
AGGGCCGCAGAAAGGCCTTCTCTACCTGCTCCTCCCACCTCTGCGTGGTGGGGCTTTTCTTTGGCAGCGC 
CATTG T CATG TACATGG C C C C CAAG T CAAG C CAT TCT CAAGAACGG AGGAAGAT C CTT T C C CTGTTT TAC 
AG CC TT T TCAAC C CGAT C CTGAAC C CC CT CAT C TACAG C CT TAATG CAGAGGTGAAAGGGG CT C TAAAGA 
GAGT C C T TTGGAAACAGAGAT CAATTGAAGAAT CATTTG AGATTT C CT GAG AA 



The disclosed NOV85 polypeptide (SEQ ID NO:202) encoded by SEQ ID NO:201 has 
318 amino acid residues and is presented in Table 85B using the one-letter amino acid code 



Table 85B. Encoded NOV85 protein sequence (SEQ ID NO:202). 

MESNQTWITEVILLGFQVGPALELFLFGFFLLFYSLTLMGNLDSRLHTPMYVFLSHLAIVDMSYASSTVP 
K^Ix^LVMHKKVISFAPCILQTFLYIAFAITECLILVMMCYDRYVAICHPLQYTLIMNWRVCTVLASTCW 
IFSFLLALVHITLILRLPFCGHKRSTTFFLWPQKINHFFCQIMSVFKLACADTRLNQWLFAGSAFILVG 
PLCLVLVSYLHILVAILRIQSGEGRRKAFSTCSSHLCWGLFFGSAIVMYMAPKSSHSQERRKILSLFYS 
LFNP I LN PL I YSLNAE VKGALKRVLWKQRS I E ES FE I S 
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A search of sequence databases reveals that the NOV85 amino acid sequence has 
181/311 (58%) identity and 232/311 (74%) similarity with SPTREMBL-ACC:O95047 
WUGSC:H_DJ0988G15.2 PROTEIN - Homo sapiens. Public amino acid databases include 
the GenBank databases, SwissProt, PDB and PIR. 
5 The disclosed NOV85 polypeptide has homology to the amino acid sequences shown 

in the BLASTP data listed in Table 85C. 



Table 85C. BLAST results for NOV85 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 18565914 | ref | XP 
095005. 1| 
(XM_095005) 


protein XP_095005 
[Homo sapiens] 


310 


299/320 
(93%) 


300/320 
(93%) 


e-134 


qi| 184 80 8 94 | gb | AAL6 
1461. l| (AY073798) 


olfactory- 
receptor MOR2 61- 
12 [Mus musculus] 


308 


254/317 
(80%) 


277/317 
(87%) 


e-115 


gi j 184 80182 | gb | AAL6 
1105. l| (AY073442 


olfactory 
receptor MOR261-4 
[Mus musculus] 


310 


238/320 
(74%) 


266/320 
(82%) 


e-111 


gi | 184 80180 |gb| AAL6 
1104 . 1 | (AY073441) 


olfactory- 
receptor MOR261-3 
[Mus musculus] 


310 


235/320 
(73%) 


263/320 
(81%) 


e-110 


gi | 18565912 | ref | XP 

069619.2] 

(XM_069619 


similar to 
olfactory- 
receptor [Homo 
sapiens] 


311 


231/320 
(72%) 


258/320 
(80%) 


e-108 



Table 85D lists the domain descriptions from DOMAIN analysis results against 
10 NOV85. This indicates that the NOV85 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 85D. Domain Analysis of NOV85 

gnl|Pfam|pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 94.9% aligned 

Score = 111 bits (278), Expect = 5e-26 



The disclosed NOV85 nucleic acid of the invention encoding a GPCR-like protein 
15 includes the nucleic acid whose sequence is provided in Table 85 A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 85A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 

457 




*5 #** .H* !R "i d J ^ £3& ^ ssr- 

»jsu5J -iwiP q^JJ <^u3» ^j^^JP n^J 1 ai ^1,,.7» H lUuu autn'g ? W " 



invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
5 include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 

1 0 bases may be so changed. 

The disclosed NOV85 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 85B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 85B 
while still encoding a protein that maintains its GPCR-like activities and physiological 

1 5 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 42 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 

20 (NOV85) may function as a member of a "GPCR family". Therefore, the NOV85 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 

25 targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV85 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 

30 and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 

(NOV85) may be useful in gene therapy, and the GPCR-like protein (NOV85) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
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cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
5 disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV85 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

10 NOV85 nucleic acids and polypeptides are further useful in the generation of 

antibodies that bind immuno-specifically to the novel NOV85 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV85 proteins have multiple hydrophilic 

1 5 regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV86 

20 A disclosed NOV86 nucleic acid of 971 nucleotides (also referred to as CG-01) 

encoding a -like protein is shown in Table 86A. The start and stop codons are in bold letters. 



Table 86A. NOV86 nucleotide sequence (SEQ ID NO:203). 



TACCCTTGTTAA TGGCCTTTGGGGAATTCCCCAGCCCCATTCCCCCGAGTTCCCTCCTCCCTTTGGGCTT 
GTTCTGCGGACCCCCCAAACAATCCCGGGGCTTTCTGCCCTTCTTTGTGCTGTTCCTGGGGATTTACCTC 
CTGACCATAATGGAAAACCTGATGCTGCTGCTCATGATCAGGGCTGATTCTTGTCTCCATAAGCCCATGT 
ATTTCTTCCTGAGTCACCTCTCTTTTGTTGATCTCTGCTTCTCTTCAGTCATTGTGCCCAAGATGCTGGA 
GAACCTCCTGTCACAGAGGAAAACCATTTCAGTAGAGGGCTGCCTGGCTCAGGTCTTCTTTGTGTTTGTC 
ACTGCAGGGACTGAAGCCTGCCTTCTCTCAGGGATGGCCTATGACCGCCATGCTGCCATCTGCCGCCCAC 
TACTTTATGGACAGATCATGGGTAAACAGCTGTATATGCACCTTGTGTGGGGCTCATGGGGACTGGGCTT 
TCTGGACGCACTCATCAATGTCCTCCTAGCTGTAAACATGGTCTTTTGTGAAGCCAAAATCATTCACCAC 
TACAGCTATGAGATGCCATCCCTCCTCCCTCTGTCCTGCTCTGATATCTCCAGAAGCCTCATCGCCTTGC 
TCTGCTCCACTCTCCTACATGGGCTGGGAAACTTCCTTTTGGTCTTCTTATCCTACACCCGTATAATCTC 
TACCATCCTAAGCATCAGCTCTACCTCGGGCAGAAGCAAGGCCTTCTCCACCTGCTCTGCCCACCTCACT 
GCAGTGACACTTTACTATGGCTCAGGTTTGCTCCGCCATCTCATGCCAAACTCAGGTTCCCCCATAGAGT 
TGATCTTCTCTGTGCAGTATACTGTAGTCACTCCCATGCTGAATTCCCTCATCTATAGCCTGAAAAATAA 
GGAAGTGAAGGTAGCTCTGAAAAGAACTTTGGAAAAATATTTGCAATATACCAGACGTTGA 
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The disclosed NOV86 polypeptide (SEQ ID NO:204) encoded by SEQ ID NO:203 
has 319 amino acid residues and is presented in Table 86B using the one-letter amino acid 
code. 



Table 86B. Encoded NOV86 protein sequence (SEQ ID NO:204). 

MAFGEFPSPIPPSSLLPLGLFCGPPKQSRGFLPFFVLFLGIYLLTIMENLMLLLMIRADSCLHKPMYFFL 
SHLSFVDLCFSSVIVPKMLENLLSQRKTISVEGCLAQVFFVFVTAGTEACLLSGMAYDRHAAICRPLLYG 
QIMGKQLYMHLVWGSWGLGFLDALINVLLAVNMVFCEAKIIHHYSYEMPSLLPLSCSDISRSLIALLCST 
LLHGLGN FLL V F LS YTR IISTILSISSTSGRS KAF S T C S AHLT AVT L Y YG S GLLRH LMPN SG SPIELIFS 
VQYTWTPMLNSLIYSLKNKEVKVALKRTLEKYLQYTRR 



A search of sequence databases reveals that the NOV86 amino acid sequence has 
134/300 (44%) identity and 192/300 (64%) similarity with SPTREMBL-ACC:035184 
OLFACTORY RECEPTOR - Rattus norvegicus. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 



The disclosed NOV86 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 86C. 



Table 86C. BLAST results for NOV86 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Po 
sit ives 
{%) 


Expect 


gi | 17474603 | ref | XP 
062552. l| 
(XM_062552) 


similar to 
olfactory 
receptor [Homo 
sapiens] 


312 


290/307 
(94%) 


293/307 
(94%) , 


e-134 


gi| 184 794 02 |gb|AAL6 
0715. l| (AY073052) 


olfactory 
receptor MOR16 0-1 
[Mus musculus] 


309 


200/281 
(71%) 


234/281 
(83%) 


3e-97 


gi | 18480580 |gb| AAL6 
1304 . l| (AY073641) 


olfactory 
receptor MOR160-2 
[Mus musculus] 


308 


193/280 
(68%) 


236/280 
(83%) 


8e-88 


gi | 184 8 0924 | gb | AAL6 
1476. l| (AY073813) 


olfactory 
receptor MOR160-5 
[Mus musculus] 


311 


192/282 
(68%) 


237/282 
(83%) 


5e-87 


gi | 18480922 | gb | AAL6 
1475. l| (AY073812) 


olfactory 
receptor MOR160-4 
[Mus musculus] 


305 


170/279 
(60%) 


215/279 
(76%) 


e-78 



Table 86D lists the domain descriptions from DOMAIN analysis results against 
NOV86. This indicates that the NOV86 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 86D. Domain Analysis of NOV86 



gnljPfam[pfam00001 , 7tm_l, 7 transmembrane receptor (rhodopsin 



family). 



CD-Length = 254 residues, 44.9% aligned 

Score = 68.2 bits (165), Expect = 7e-13 



The disclosed NOV86 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 86A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 86A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV86 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 86B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 86B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 56 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2,that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV86) may function as a member of a "GPCR family". Therefore, the NOV86 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
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therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 



delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 



and cell types composing (but not limited to) those defined here. 



5 



The NOV86 nucleic acids and proteins of the invention are useful in potential 



therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV86) may be useful in gene therapy, and the GPCR-like protein (NOV86) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 

10 compositions of the present invention will have efficacy for treatment of patients suffering 

from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 

1 5 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV 86 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 

20 assessed. 

NOV86 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV86 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
25 NOVX Antibodies" section below. The disclosed NOV86 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



A disclosed NOV87 nucleic acid of 1067 nucleotides (also referred to as CG56763-01) 
encoding a GPCR-like protein is shown in Table 87A. The start and stop codons are in bold 
letters. 



30 



NOV87 
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Table 87A. NOV87 nucleotide sequence (SEQ ID NO:205). 

CCCTTTCCTCTTGCTCTTTG ATGTTTTGTAGGCCTGCAGCTCCCAAGCACAGAGGCATGAGTGGGGAGAA 
TGTCACCAGGGTCGGCACCTTCATCCTGGTGGGCTTCCCCACGGCCCCAGGGCTGCAGTACCTGCTCTTC 
CTCCTCTTCCTGCTCACCTACCTCTTTGTCCTGGTGGAGAACCTGGCCATCATCCTCACCGTCTGGAGCA 
GCACCTCCCTCCACAGGCCCATGTACTACTTTCTGAGCTCCATGTCTTTCCTAGAGATCTGGTACGTGTC 
TGACATCACCCCCAAGATGCTGGAGGGCTTCCTCCTCCAGCAGAAACGCATCTCTTTCGTCGGGTGCATG 
ACGCAGCTCTACTTCTTCAGCTCCCTGGTGTGCACCGAGTGTGTGCTTCTGGCCTCCATGGCCTACGACC 
GCTACGTGGCCATCTGCCACCCGCTGCGCTACCACGTCCTTGTGACCCCGGGGCTGTGCCTCCAGCTGGT 
GGGCTTCTCCTTTGTGAGTGGCTTCACCATCTCCATGATCAAGGTCTGTTTTATCTCCAGCGTCACGTTC 
TGTGGCTCCAACGTCTTGAACCACTTCTTCTGTGACATTTCCCCCATCCTCAAGCTGGCCTGCACGGACT 
TCTCCACTGCAGAGCTGGTGGATTTCATTCTGGCCTTCATCATCCTGGTGTTTCCACTCCTGGCCACCAT 
GCTGTCATATGCGCACATCACCCTGGCTGTCCTGCGCATCCCCTCGGCCACCGGCTGCTGGAGAGCCTTC 
TTCACCTGCGCCTCTCACCTCACCGTGGTCACCGTCTTCTATACAGCCTTGCTTTTCATGTATGTCCGGC 
C C CAGG C CATTGATTCC CGGAGCT C CAACAAG CT CAT CT CTGTTTTGTAC ACAGTTAT CACC C C CATCTT 
GAACCCCTTGATATACTGCCTGAGGAATAAGGAATTTAAGAATGCCTTGAAAAAAGCCTTCGGCTTGACG 
AGCTGCGCCGTAGAGGGGAGGCTTTCTAGTCTTCTGGAACTTCATCTCCAAATACACAGCCAGCCTCTCT 
G AGGAGG C CATT TGACT 



The disclosed NOV87 polypeptide (SEQ ID NO:206) encoded by SEQ ID NO:205 
has 343 amino acid residues and is presented in Table 87B using the one-letter amino acid 
code. 



Table 87B. Encoded NOV87 protein sequence (SEQ ID NO:206). 

MFCRPAAPKHRGMSGENVTRVGTFILVGFPTAPGLQYLLFLLFLLTYLFVLVENLAIILTVWSSTSLHRP 
MYYFLSSMSFLEIWYVSDITPICMLEGFLLQQKRISFVGCMTQLYFFSSLVCTECVLLASMAYDRYVAICH 
PLRYHVLVTPGLCLQLVGFSFVSGFTISMIKVCFISSVTFCGSNVLNHFFCDISPILKLACTDFSTAELV 
DFILAFIILVFPLLATMLSYAHITLAVLRIPSATGCWRAFFTCASHLTWTVFYTALLFMYVRPQAIDSR 
SSNKLISVLYTVITPILNPLIYCLRNKEFKNALKKAFGLTSCAVEGRLSSLLELHLQIHSQPL 



A search of sequence databases reveals that the NOV87 amino acid sequence has 
21 1/306 (68%) identity and 250/306 (81%) similarity with TREMBLNEW-ACC.AAG45 1 89 
M51 OLFACTORY RECEPTOR - Mus musculus. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV87 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 87C. 



Table SIC. BLAST results for NOV87 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi)l7435880|ref |XP 


similar to 
olfactory 
receptor 41 [Homo 
sapiens] 


331 


331/331 
(100%) 


331/331 
(100%) 


e-150 


065375 . 1 | 
(XM_065375) 


gi|l7435888|ref |XP 


similar to 
olfactory 
receptor 41 [Homo 
sapiens] 


312 


293/307 
(95%) 


300/307 
(97%) 


e~133 


065377 . 1 | 
(XM_065377) 
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y 1 1 io4 / yiyb | g.D | aaj_io 
0712. l| (AY073049 


ol factory 
receptor MOR103-2 
[Mus mus cuius] 


312 


270/307 
(87%) 


287/307 
(92%) 


e-122 


gi| 184 79398 | gb | AAL6 


0 1 f a c 1 0 r y 
receptor MOR103-3 
[Mus musculus] 


312 


(85%) 


(92%) 


e- 12 0 


0713 .1) (AY073050) 


gi| 12 007416 | gb | AAG4 
5189 .1 | (AF321234) 


m51 olfactory 
receptor [Mus 
musculus] 


314 


211/306 
(68%) 


250/306 
(80%) 


e-101 



Table 87D lists the domain descriptions from DOMAIN analysis results against 
NOV87. This indicates that the NOV87 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 87D. Domain Analysis of NOV87 

gnllPfamlpfamOOOOh 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 

Score == 98.2 bits (243), Expect = 7e-22 

The disclosed NOV87 nucleic acid of the invention encoding a GPCR-like protein 

includes the nucleic acid whose sequence is provided in Table 87A or a fragment thereof. The 

invention also includes a mutant or variant nucleic acid any of whose bases may be changed 

from the corresponding base shown in Table 87A while still encoding a protein that maintains 

its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 

invention further includes nucleic acids whose sequences are complementary to those just 

described, including nucleic acid fragments that are complementary to any of the nucleic acids 

just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

complements thereto, whose structures include chemical modifications. Such modifications 

include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least 

in part to enhance the chemical stability of the modified nucleic acid, such that they may be 

used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 

bases may be so changed. 

The disclosed NOV87 protein of the invention includes the GPCR-like protein whose 

sequence is provided in Table 87B. The invention also includes a mutant or variant protein 

any of whose residues may be changed from the corresponding residue shown in Table 87B 

while still encoding a protein that maintains its GPCR-like activities and physiological 
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functions, or a functional fragment thereof. In the mutant or variant protein, up to about 32 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F ab or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 
5 The above defined information for this invention suggests that this GPCR-like protein 

(NOV87) may function as a member of a "GPCR family". Therefore, the NOV87 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 

1 0 therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV87 nucleic acids and proteins of the invention are useful in potential 

1 5 therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV87) may be useful in gene therapy, and the GPCR-like protein (NOV87) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 

20 from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 

25 disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV87 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

30 NOV87 nucleic acids and polypeptides are further useful in the generation of 

antibodies that bind immuno-specifically to the novel NOV87 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV87 proteins have multiple hydrophilic 
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regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 

NOV88 

A disclosed NOV88 nucleic acid of 939 nucleotides (also referred to as CG56753-01) 
encoding a GPCR-like protein is shown in Table 88 A. The start and stop codons are in bold 
letters. 



Table 88A. NOV88 nucleotide sequence (SEQ ID NO:207). 



ATGTCAGG AG AAAAT AAT T C CT CAGTG ACTG AGTT CATT CTGG CTGGG CT CTC AG AAC AG CC AG AG CT CC 
AGCTGCCCCTCTTCCTCCTGTTCTTAGGAATCTATGTGGTCACAGTGGTGGGCAACCTGGGCATGACCAC 
ACTGATTTGGCTCAGTTCTCACCTGCACACCCCTATGTACTATTTCCTCAGCAGTCTGTCCTTCATTGAC 
TTCTGCCATTCCACTGTCATTACCCCTAAGATGCTGGTGAACTTTGTGACAGAGAAGAACATCATCTCCT 
ACCCTGAATGCATGACTCAGCTCTACTTCTTCCTCGTTTTTGCTATTGCAGAGTGTCACATGTTGGCTGC 
AATGG CGTATGAC CG TTACATGGC CATCTGTAG CCCCTTGC TGTACAGTG T CATCATAT C CAATAAGG C T 
TGCTTTTCTCTGATTTTAGGGGTGTATATAATAGGCCTGGTTTGTGCATCAGTTCATACAGGCTGTATGT 
TTAGGGTTCAATTCTGCAAATTTGATTTGATTAACCATTATTTCTGTGATCTTCTTCCCCTCCTAAAGCT 
CTCTTGCTCTAGTATCTATGTCAACAAACTACTTATTCTATGTGTTGGTGCATTTAACATCCTTGTCCCC 
AGCCTGACCATCCTTTGCTCTTACATCTTTATTATTGCCAGCATCCTCCACATTCGCTCCACTGAGGGCA 
GGTCCAAAGCCTTCAGCACTTGTAGCTCCCACATGTTGGCGGTTGTAATCTTTTTTGGATCTGCAGCATT 
CATGTACTTGCAGCCATCTTCAATCAGCTCCATGGACCAGGGGAAAGTATCCTCTGTGTTTTATACTATT 
ATTGTGCCCATGTTGAACCCTCTGATTTATAGCCTGAGGAATAAAGATGTCCATGTTTCCCTGAAGAAAA 
TG CTAGAGAGAAGAACATTATTG TAAACA 



The disclosed NOV88 polypeptide (SEQ ID NO:208) encoded by SEQ ID NO:207 has 
311 amino acid residues and is presented in Table 88B using the one-letter amino acid code. 



Table 88B. Encoded NOV88 protein sequence (SEQ ID NO:208). 



MSGENNSSVTEFILAGLSEQPELQLPLFLLFLGIYWTWGNLGMTTLIWLSSHLHTPMYYFLSSLSFID 
FCHSTVITPKMLVNFVTEKNI ISYPECMTQLYFFLVFAIAECHMLAAMAYDRYMAICSPLLYSVIISNPCA 
CFSL I LGVY I IGLVCAS VHTGCMFRVQFCKFDL INHYFCDLLPLLKLS CS S I YVNKLL I LCVGAFN I LVP 
SLTILCSYIFIIASILHIRSTEGRSKAFSTCSSHMLAWIFFGSAAFMYLQPSSISSMDQGKVSSVFYTI 
IVPMLNPLIYSLRNKDVHVSLKKMLQRRTLL 



A search of sequence databases reveals that the NOV88 amino acid sequence has 
239/3 1 1 (76%) identity and 275/3 1 1 (88%) similarity with TREMBLNEW-ACC:AAG39856 
ODORANT RECEPTOR Kl 1 - Mus musculus. Public amino acid databases include the 
GenBank databases, SwissProt, PDB and PIR. 

The disclosed NOV88 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 88C. 
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Table 88C. BLAST results for NOV88 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 18578547 | ref | XP 


olfactory 
receptor, family 
8, subfamily G, 
member 1 [Homo 
sapiens] 


311 


311/311 
(100%) 


311/311 
(100%) 


e-141 


090109 . 1 | 
(XM_090109) 


gi|l7472672 |ref I XP 


similar to 
odorant receptor 
Kll [Homo 
sapiens] 


311 


263/311 
(84%) 


285/311 
(91%) 


e-122 


061794 . 1 | 
(XM_061794) 


gi | 1847 9824 |gb|AAL6 


olfactory 
receptor MOR171- 
5 [Mus mus cuius] 


314 


240/311 
(77%) 


276/311 
(88%) 


e-114 


0926 . 1 | (AY073263) 


gi | 11692519 | gb | AAG3 
9856.l|AF282271 1 


odorant receptor 
Kll [Mus 
musculus] 


314 


239/311 
(76%) 


275/311 
(87%) 


e-113 


(AF282271) 


gi | 17472670 | ref | XP 


similar to 
odorant receptor 
K15 [Homo 
sapiens] 


258 


258/258 
(100%) 


258/258 
(100%) 


e-113 


061793 .1 | 
(XM_061793) 



Table 88D lists the domain descriptions from DOMAIN analysis results against 
NOV88. This indicates that the NOV88 sequence has properties similar to those of other 
5 proteins known to contain this domain. 



Table 88D. Domain Analysis of NOV88 

gnl 1 Pf am | pf amOOOOl , 7tm__l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 100.0% aligned 
Score = 88.6 bits (218), Expect ~ 5e-19 



The disclosed NOV88 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 88A of a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 88 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
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used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV88 protein of the invention includes the GPCR-like protein whose 
5 sequence is provided in Table 88B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 88B 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof In the mutant or variant protein, up to about 24 
percent of the residues may be so changed. 

10 The invention further encompasses antibodies and antibody fragments, such as F a b or 

(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV88) may function as a member of a "GPCR family". Therefore, the NOV88 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 

1 5 implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 

20 and cell types composing (but not limited to) those defined here. 

The NOV88 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV88) may be useful in gene therapy, and the GPCR-like protein (NOV88) may be useful 

25 when administered to a subject in need thereof. By way of nonlimiting example, the 

compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 

30 psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV88 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
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applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV88 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV88 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV88 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOV89 

A disclosed NOV89 nucleic acid of 1003 nucleotides (also referred to as CG57670-01) 
encoding a GPCR-like protein is shown in Table 89A. The start and stop codons are in bold 
letters. 



Table 89 A. NOV89 nucleotide sequence (SEQ ID NO:209). 

TGTTCCATACATTATTTTGTCTTTTGTCTGAAGCAA.TGCTGAATACAACCTCAGTCACTGAATT 
TCTCCTTTTGGGAGTGACAGACATTCAAGAACTGCAGCCTTTTCTCTTCGTTGTTTTCCTTACC 
ATCTACTTCATCAGTGTGGCTGGGAATGGAGCCATTCTGATGATTGTCATCTCTGATCCTAGAC 
TCCATTCCCCTATGTATTTCTTCCTGGGAAACCTGTCCTGCCTGGACATCTGCTACTCCAGCGT 
AACACTGCCAAAAATGCTGCAGAACTTCCTCTCTGCACACAAAGCAATTTCTTTCTTGGGATGC 
ATAAGCCAACTCCATTTCTTCCACTTCCTGGGCAGCACAGAGGCCATGTTGTTGGCCGTGATGG 
CATTTGACCGCTTTGTGGCTATTTGCAAGCCACTTCGCTACACTGTCATTATGAACCCTCAGCT 
CTGTACCCAGATGGCCATCACAATCTGGATGATTGGTTTTTTCCATGCCCTGCTGCACTCCCTA 
ATGACCTCTCGCTTGAACTTCTGTGGTTCTAACCGTATCTATCACTTCTTCTGTGATGTGAAGC 
CATTGCTAAAGCTGAGCCTTAATCAGTGGCTGCTCAGTACTGTCACAGGGACAATCGCCATGGG 
CCCCTTCTTTCTCACATTACTCTCCTATTTCTACATTATCACCCATCTCTTCTTCAAGACTCAT 
TCTTTTAGCATGCTCCGCAAAGCACTGTCCACTTGTGCCTCCCACTTCATGGTAGTTATTCTTT 
TGTATGCACCTGTTCTCTTCACCTATATTCATCATGCCTCAGGGACCTCCATGGACCAGGACCG 
GATCACTGCCATCATGTATACTGTGGTCACTCCAGTACTAAACCCACTGATCTACACTTTGAGG 
AACAAGGAAGTGAAAGGGGCCTTTAATAGAGCAATGAAAAGGTGGCTTTGGCCTAAAGAAATCT 
TGAAGAACTCTTCTGAAGCATAAATAAACAATTAAAAAGATGA 



The disclosed NOV89 polypeptide (SEQ ID NO:210) encoded by SEQ ID NO:209 has 
315 amino acid residues and is presented in Table 89B using the one-letter amino acid code 



Table 89B. Encoded NOV89 protein sequence (SEQ ID NO:210). 

MLNTTSVTEFLLLGVTDIQELQPFLFWFLTIYFISVAGNGAILMIVISDPRLHSPMYFFLGNLSCLDIC 
YSSVTLPKMLQNFLSAHKAISFLGCISQLHFFHFLGSTEAMLIAVHAFDRFVAICKPLRYTVIMNPQLCT 
QMAI T I WM IGFFHALLHS LMTSRLNFCGSNRI YHFFCDVKPLLKLS LNQWLLSTVTGT I AMGP FFLTLLS 
YFYIITHLFFKTHSFSMLRKALSTCASHFMWILLYAPVLFTYIHHASGTSMDQDRITAIMYTWTPVLN 
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PL I YTLRNKE VKGAFNRAMKRWLW P KE I LKNS S EA 



A search of sequence databases reveals that the NOV89 amino acid sequence has 
similarity with SPTREMBL-ACC.Q9UGF7 BA150A6.1 (NOVEL 7 TRANSMEMBRANE 
RECEPTOR (RHODOPSIN FAMILY)(OLFACTORY RECEPTOR LIKE) PROTEIN 
(HS6M1-27)) - Homo sapiens. Public amino acid databases include the GenBank databases, 
SwissProt, PDB and PIR. 

The disclosed NOV89 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 89C. 



Table 89C. BLAST results forNOV89 


Gene Index/ 
1 Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi|l7464355|ref |XP 
069464. 1| 
(XM_069464) 


similar to 
olfactory- 
receptor, family 
12, subfamily D, 

sapiens] 


284 


279/315 
(88%) 


280/315 
(88%) 


e-130 


gi 1 18563691 1 ref |XP 


hypothetical 
protein XP_094753 
[Homo sapiens] 


284 


278/315 
(88%) 


279/315 
(88%) 


e-130 


094753. 1| 
(XM_094 753) 


gi|7363443|ref |NP 0 


olfactory 
receptor, family 
12, subfamily D, 
member 2 [Homo 
sapiens] 


307 


269/306 
(87%) 


281/306 
(90%) 


e-126 


39224 .l| (NM 013936 


gi]l8563689|ref |XP 


similar to 
olfactory 
receptor, family 
12, subfamily D, 
member 2 (H. 
sapiens) [Homo 
sapiens] 


307 


268/306 
(87%) 


280/306 
(90%) 


e-125 


084201. li 
(XM_084201) 


qi | 1502032 8 | emb | CAC 


bM332P19.2 (novel 
7 transmembrane 
receptor 

(rhodopsin 
family) 

(olfactory 
receptor like) 
protein (mml7Ml- 
13) ; ortholog of 
human DJ994E9.8 

(HS6M1-20)) [Mus 
musculus] 


308 


241/308 
(78%) 


269/308 
(87%) 


e-117 


44545. l| (AL133159) 



Table 89D lists the domain descriptions from DOMAIN analysis results against 
NOV89. This indicates that the NOV89 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 89D. Domain Analysis of NOV89 

gnl|Pfam|pfam00001 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 

Score = 101 bits (251), Expect = 7e-23 



The disclosed NOV89 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 89A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
5 from the corresponding base shown in Table 89 A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

1 0 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids wh ose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 

1 5 In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV89 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 89B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 89B 

20 while still encoding a protein that maintains its GPCR-like activities and physiological 

functions, or a functional fragment thereof. In the mutant or variant protein, up to about 5 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospecifically to any of the proteins of the invention. 

25 The above defined information for this invention suggests that this GPCR-like protein 

(NOV89) may function as a member of a "GPCR family". Therefore, the NOV89 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
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therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 



delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 



and cell types composing (but not limited to) those defined here. 



5 



The NOV89 nucleic acids and proteins of the invention are useful in potential 



therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV89) may be useful in gene therapy, and the GPCR-like protein (NOV89) may be useful 
when administered to a subject in need thereof. By way of nonlimiting example, the 

10 compositions of the present invention will have efficacy for treatment of patients suffering 

from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 

1 5 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV89 nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 

20 assessed. 

NOV89 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV 89 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
25 NOVX Antibodies" section below. The disclosed NOV89 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



A disclosed NOV90 nucleic acid of 950 nucleotides (also referred to as CG57676-01) 
encoding a GPCR-like protein is shown in Table 90A. The start and stop codons are in bold 
letters. 



30 



NOV90 



472 



•2* S3& .as?, its- .rss 

iila S** in « it -:rj ai lt a is^c n^l* 



Table 90A. NOV90 nucleotide sequence (SEQ ID NO:211). 

GTAATAGGAAA TGAATGATGATGGAAAAGTCAATGCTAGCTCTGAGGGGTACTTT 

ATTTTAGTTGGATTTTCTAATTGGCCTTATCTGGAAGTAGTTCTCTTTGTGGTTATTTTGATCTTCTGCT 
TGATGACACTGATAGGAAACCTGTTCATCATCATCCTGACGTACCTGGACTCCCATCTCCATACTCCCTT 
GTATTTCTTCCTTTCAAATCTCTCATTTCTGGATCTCTGCTACACCACCAGCTCTATCCCTCAGTTGCTG 
GTCAGTCTCTGGGGTGTGGAAAAGACCATTTCTTATGCTGGTTGCATGGTTCAACTTTACTTTTTTCTCA 
CACTGGGAACCACAGAGTGTGTCCTACTGGTGGTGATGTCCTATGACCGTTATGCAGCTGTGTGTAGACC 
TTTGCATTACACTGTCCTCATGCACTCTCGTTTCTGCCACTTGTTGGCTGTGGCTTCTTGGGTAAGTGGT 
TTTACAAACCCAGCACTTCATTCCTCCTTCACCTTCTGGGTACCTCTGTGTGGACACCGCCAAATAGATC 
ACTTTTTCTGTGAAGTTCCGGCACTTTTAAGATTATCATTTGTCAATACCCGTGAAAATAAACTGACCCT 
CATGATCACAAGCTCCATTTTTGTTCTGCTACTTCTCACCCTCATTTTCACTTCCTATGGTGCTATTGCC 
CAGG C TG TAC TGAGGATG CAGT C AAC CAC TGGGCTTCAGAAAGTATT TGGAACATGTGGAG CT CAT CATA 
TGGTTGTATCTCTCTTTTTCATTCCGGCCATGTGCATGTATCTCCAGCCACCATCAGGGAATTCTCAAGA 
TCT^AGGCAAGTTCATTGCTCTCTTTTATACTGTTGTTACACCTAGTCTTAACCCTCTAATCTACACCCTC 
AGAAACAAAGATG TAAG AGGGG T AGTG AAG AGAC TAAGGGGG TGGGAGTGAGC C T 



The disclosed NOV90 polypeptide (SEQ ID NO:212) encoded by SEQ ID NO:21 1 has 
3 1 1 amino acid residues and is presented in Table 90B using the one-letter amino acid code. 

5 



Table 90B. Encoded NOV90 protein sequence (SEQ ID NO:212). 

MNDDGKVNASSEGYFILVGFSNWPYLEWLFVVILIFCLMTLIGNLFIIILTYLDSHLHTPLYFFLSNLS 
FLDLCYTTSSIPQLLVSLWGVEKTISYAGCMVQLYFFLTLGTTECVLLWMSYDRYAAVCRPLHYTVLMH 
SRFCHLLAVASWVSGFTNPALHSSFTFWVPLCGHRQIDHFFCEVPALLRLSFVNTRENKLTLMITSSIFV 
LLLLTLIFTSYGAIAQAVLRMQSTTGLQKVFGTCGAHHMWSLFFIPAMCMYLQPPSGNSQDQGKFIALF 
YTWTPSLNPL I YTLRNKDVRG WKRLRGWE 



A search of sequence databases reveals that the NOV90 amino acid sequence has 
similarity with TREMBLNEW-ACC:CAC20478 OLFACTORY RECEPTOR - Homo 
sapiens. Public amino acid databases include the GenBank databases, SwissProt, PDB and 
10 PIR. 

The disclosed NOV90 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 90C. 



Table 90C. BLAST results for NOV90 


Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi| 12 054 34 7 | emb | CAC 
20478 . 1 | (AJ302558) 


ol factory 
receptor [Homo 
sapiens] 


311 


281/311 
(90%) 


293/311 
(93%) 


e-139 


gi | 12 054 34 5 | emb | CAC 
20477 . 1 | (AJ302557) 


olfactory 
receptor [Homo 
sapiens] 


311 


280/311 
(90%) 


293/311 
(94%) 


e-139 


gi | 12140469 | emb | CAC 
21440 . 1 | (AJ302552) 


olfactory 
receptor [Homo 
sapiens] 


311 


280/311 
(90%) 


292/311 
(93%) 


e-139 
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gi| 14423775 | sp|O760 


OLFACTORY 
RECEPTOR 2J3 

(OLFACTORY 
RECEPTOR 6-6) 

(OR6-6) 


311 


280/311 
(90%) 


293/311 
(94%) 


e-139 


Ol|02J3 HUMAN 


gi| 18564769|ref |XP 


similar to 

RECEPTOR 2J3 
(OLFACTORY 

RECEPTOR 6-6) 
(OR6-6) (HS6M1-3) 
[Homo sapiens] 


391 


270/298 

\ y ) 


283/298 
( 94%) 


e-136 


069457 . 2 | 
(XM_069457 



Table 90D lists the domain descriptions from DOMAIN analysis results against 
NOV90. This indicates that the NOV90 sequence has properties similar to those of other 
proteins known to contain this domain. 
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Table 90D. Domain Analysis of NOV90 

gnllPfam|pfam0000 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). CD-Length = 254 residues, 100.0% aligned 

Score = 89.7 bits (221), Expect = 2e-19 

The disclosed NOV90 nucleic acid of the invention encoding a GPCR-like protein 
includes the nucleic acid whose sequence is provided in Table 90A or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base shown in Table 90A while still encoding a protein that maintains 
its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as anti sense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

The disclosed NOV90 protein of the invention includes the GPCR-like protein whose 
sequence is provided in Table 90B. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 90B 
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while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 5 
percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
5 (F a b)2, that bind immunospecifically to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV90) may function as a member of a "GPCR family". Therefore, the NOV90 nucleic 
acids and proteins identified here may be useful in potential therapeutic applications 
implicated in (but not limited to) various pathologies and disorders as indicated below. The 

10 potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 
delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

1 5 The NOV90 nucleic acids and proteins of the invention are useful in potential 

therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 
(NOV90) may be useful in gene therapy, and the GPCR-like protein (NOV90) may be useful 
when administered to a subject in need thereof. By ^yay of nonlimiting example, the 

20 compositions of the present invention will have efficacy for treatment of patients suffering 

from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 

25 diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 

30 assessed. 

NOV90 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV90 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
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NOVX Antibodies" section below. The disclosed NOV90 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



10 



NOV91 

A disclosed NOV91 nucleic acid of 967 nucleotides (also referred to as CG57668-01) 
encoding a GPCR-like protein is shown in Table 91 A. The start and stop codons are in bold 
letters. 



Table 91A. NOV91 nucleotide sequence (SEQ ID NO:213). 



GCATTTGCCCCAGTAGCT ATGATTATAATTTGCAATGACAGCCACAGTGATTTCATCCTTCTGG 
GCTTCTCTAACAAGCCACATTTGGAGAAGATACTTTTTTGGATCATTTTTATTTTTTATTTTTT 
GACTCTTGCAGGAAATATGGTCATAGTTCTTGTGTCCTTGAAGGATCCAAAACTCCACATCCCT 
ATGTATTTCTTTCTTTCCAACCTTTCCTTGGTAGACCTCTGTTTGACCAGCAGCTGTGTTCCAC 
AGATGTTGATTAACTTCTGGGGCCCAGAAAAGACCATCAGCTACATTGGCTGTGCCATTCAACT 
CTATGTTTTTTTGTGGCTTGGGGCCACGGAATATGTCCTTCTTGTTGTCATGGCTGTGGATTGT 
TATGTAGCAGTGTGTCATCCACTGCAAAATACCATGATCATGCACCCAAAACTTTGTCTGCAGC 
TGGCTATCTTGGCATGGGGGACTGGCTTGGCCCAGTCTCTGATCCAGTCCCCTGCCACCCTCCG 
GTTACCCTTCTGCTCCCAGCGGATGGTGGATGATGTTGTTTGTGAAGTCCCAGCTCTGATTCAG 
CTCTCCAGTACTGATACTACCTACAGTGAAATTCAGATGTCTATCGCCAGTGTTGTCCTCCTGG 
TGATGCCCTTGATCATTATCCTTTCCTCTTCTGGTGCTATTGCTAAGGCTGTGCTGAGAATTAA 
GTCAACTGCAGGACAGAAGAAAGCATTTGGCACCTGCATCTCTCACCTTCTTGTGGTTTCTCTC 
TTTTATGGCACTGTCACAGGTGTCTACCTTCAACCAAAAAATCACTATCCTCATGAATGGGGCA 
AATTTCTCACTCTTTTCTACACTGTAGTAACCCCAACTCTTAATCCCCTCATCTACACTCTAAG 
GAACAAGGAGGTAAAGGGAGCACTAATAAGATTGGGGAGGAGGACCTGGGATTCCCAGAATAAC 
TAACAAG 



15 



The disclosed NOV91 polypeptide (SEQ ID NO:214) encoded by SEQ ID NO:213 
has 314 amino acid residues and is presented in Table 91 B using the one-letter amino acid 
code. 



Table 91B. Encoded NOV91 protein sequence (SEQ ID NO:214). 



MI I ICNDSHSDFILLGFSNKPHLEKILFWI IFIFYFLTLAGNMVIVLVSLKDPKLHIPMYFFLSNLSIjVD 
LCLTSSCVPQMLINFWGPEKTISYIGCAIQLYVFLWLGATEYVLLWMAVDCYVAVCHPLQNTMIMHPKL 
CLQIAIIAWGTGLAQSLIQSPATLRLPFCSQRMVDDWCEVPALIQLSSTDTTYSEIQMSIASVVLLVMP 
LIIILSSSGAIAKAVLRIKSTAGQKKAFGTCISHLLWSLFYGTVTGVYLQPKNHYPHEWGKFLTLFYTV 
VTPTLNPLIYTLRNKEVKGALIRLGRRTWDSQNN 



A search of sequence databases reveals that the NOV91 amino acid sequence has 
188/303 (62%) identity and 232/303 (76%) similarity with SWISSNEW-ACC:Q15062 
OLFACTORY RECEPTOR 2H3 (OLFACTORY RECEPTOR-LIKE PROTEIN FAT1 1) 
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Homo sapiens. Public amino acid databases include the GenBank databases, SwissProt, PDB 
and PIR. 

The disclosed NOV91 polypeptide has homology to the amino acid sequences shown 
in the BLASTP data listed in Table 91C. 



Table 91C. BLAST results for NOV91 



Gene Index/ 
Identifier 


Protein/ Organism 


Length 
(aa) 


Identity 
(%) 


Positives 
(%) 


Expect 


gi | 17464345 | ref | XP 


similar to 
olfactory- 
receptor [Homo 
sapiens] 


470 


217/221 
(98%) 


219/221 
(98%) 


e-111 


069459 .1 | 
(XM_069459) 


gi|l8565396|ref | XP 


216/221 (97%), 
Positives = 
218/221 (97%) 


289 


216/221 
(97%) 


218/221 
(97%) 


e-lll 


094938. 1| 
(XM_094938 


gi | 12231029|sp|Q150 
62|02H3 HUMAN 


Olfactory 
receptor 2H3 
(Olfactory 
receptor- like 
protein FAT11) 


316 


188/303 
(62%) 


232/303 
(76%) 


6e-99 




gi | 9798 92 2 | qb | AAF9 8 


olfactory 
receptor [Homo 
sapiens] 


303 


187/293 
(63%) 


228/293 
(76%) 


2e-97 


753 .1 |AF211941 1 


(AF211941) 


gi | 14423783 | sp|0959 


Olfactory 
receptor 2H2 
(Hs6Ml-12) 


312 


187/302 
(61%) 


230/302 
(75%) 


3e-97 


18|02H2 HUMAN 



Table 91D lists the domain descriptions from DOMAIN analysis results against 
NOV91. This indicates that the NOV91 sequence has properties similar to those of other 
proteins known to contain this domain. 



Table 91D. Domain Analysis of NOV91 

gnjjPfam IpfamOOOO 1 , 7tm_l, 7 transmembrane receptor (rhodopsin 
family). 

CD-Length = 254 residues, 100.0% aligned 
Score = 98.6 bits (244),. Expect = 5e-22 



The disclosed NOV91 nucleic acid of the invention encoding a GPCR-like protein 

includes the nucleic acid whose sequence is provided in Table 91 A or a fragment thereof. The 

invention also includes a mutant or variant nucleic acid any of whose bases may be changed 

from the corresponding base shown in Table 91 A while still encoding a protein that maintains 

its GPCR-like activities and physiological functions, or a fragment of such a nucleic acid. The 

invention further includes nucleic acids whose sequences are complementary to those just 
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described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar 
5 phosphate backbones are modified or derivatized. These modifications are carried out at least 
in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. 
In the mutant or variant nucleic acids, and their complements, up to about 5 percent of the 
bases may be so changed. 

10 The disclosed NOV91 protein of the invention includes the GPCR-like protein whose 

sequence is provided in Table 9 IB. The invention also includes a mutant or variant protein 
any of whose residues may be changed from the corresponding residue shown in Table 9 IB 
while still encoding a protein that maintains its GPCR-like activities and physiological 
functions, or a functional fragment thereof. In the mutant or variant protein, up to about 38 

1 5 percent of the residues may be so changed. 

The invention further encompasses antibodies and antibody fragments, such as F a b or 
(F a b)2, that bind immunospeciflcally to any of the proteins of the invention. 

The above defined information for this invention suggests that this GPCR-like protein 
(NOV91) may function as a member of a "GPCR family". Therefore, the NOV91 nucleic 

20 acids and proteins identified here may be useful in potential therapeutic applications 

implicated in (but not limited to) various pathologies and disorders as indicated below. The 
potential therapeutic applications for this invention include, but are not limited to: protein 
therapeutic, small molecule drug target, antibody target (therapeutic, diagnostic, drug 
targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene therapy (gene 

25 delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro of all tissues 
and cell types composing (but not limited to) those defined here. 

The NOV91 nucleic acids and proteins of the invention are useful in potential 
therapeutic applications implicated in diseases including but not limited to various pathologies 
and disorders as indicated below. For example, a cDNA encoding the GPCR-like protein 

30 (NOV91) may be useful in gene therapy, and the GPCR-like protein (NOV91) may be useful 
when administered to a subject in need thereof By way of nonlimiting example, the 
compositions of the present invention will have efficacy for treatment of patients suffering 
from CNS disorders, brain disorders including epilepsy, eating disorders, schizophrenia, ADD; 
cancer; heart disease; inflammation and autoimmune disorders including Crohn's disease, 
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IBD, allergies, rheumatoid and osteoarthritis, inflammatory skin disorders, blood disorders; 
psoriasis colon cancer, leukemia AIDS; thalamus disorders; metabolic disorders including 
diabetes and obesity; lung diseases such as asthma, emphysema, cystic fibrosis, pancreatic 
disorders including pancreatic insufficiency and cancer; and prostate disorders including 
prostate cancer, or other pathologies or conditions. The NOV nucleic acid encoding the 
GPCR-like protein of the invention, or fragments thereof, may further be useful in diagnostic 
applications, wherein the presence or amount of the nucleic acid or the protein are to be 
assessed. 

NOV91 nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOV91 substances for use in 
therapeutic or diagnostic methods. These antibodies may be generated according to methods 
known in the art, using prediction from hydrophobicity charts, as described in the "Anti- 
NOVX Antibodies" section below. The disclosed NOV91 proteins have multiple hydrophilic 
regions, each of which can be used as an immunogen. These novel proteins can be used in 
assay systems for functional analysis of various human disorders, which will help in 
understanding of pathology of the disease and development of new drug targets for various 
disorders. 



NOVX Nucleic Acids and Polypeptides 

One aspect of the invention pertains to isolated nucleic acid molecules that encode 
NOVX polypeptides or biologically active portions thereof. Also included in the invention are 
nucleic acid fragments sufficient for use as hybridization probes to identify NOVX-encoding 
nucleic acids {e.g., NOVX mRNAs) and fragments for use as PCR primers for the 
amplification and/or mutation of NOVX nucleic acid molecules. As used herein, the term 
"nucleic acid molecule" is intended to include DNA molecules (e.g., cDNA or genomic 
DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated using 
nucleotide analogs, and derivatives, fragments and homologs thereof The nucleic acid 
molecule may be single-stranded or double-stranded, but preferably is comprised double- 
stranded DNA. 

An NOVX nucleic acid can encode a mature NOVX polypeptide. As used herein, a 
"mature" form of a polypeptide or protein disclosed in the present invention is the product of a 
naturally occurring polypeptide or precursor form or proprotein. The naturally occurring 
polypeptide, precursor or proprotein includes, by way of nonlimiting example, the full-length 
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gene product, encoded by the corresponding gene. Alternatively, it may be defined as the 
polypeptide, precursor or proprotein encoded by an ORF described herein. The product 
"mature" form arises, again by way of nonlimiting example, as a result of one or more 
naturally occurring processing steps as they may take place within the cell, or host cell, in 
5 which the gene product arises. Examples of such processing steps leading to a "mature" form 
of a polypeptide or protein include the cleavage of the N-terminal methionine residue encoded 
by the initiation codon of an ORF, or the proteolytic cleavage of a signal peptide or leader 
sequence. Thus a mature form arising from a precursor polypeptide or protein that has 
residues 1 to N, where residue 1 is the N-terminal methionine, would have residues 2 through 

10 N remaining after removal of the N-terminal methionine. Alternatively, a mature form arising 
from a precursor polypeptide or protein having residues 1 to N, in which an N-terminal signal 
sequence from residue 1 to residue M is cleaved, would have the residues from residue M+l to 
residue N remaining. Further as used herein, a "mature" form of a polypeptide or protein may 
arise from a step of post-translational modification other than a proteolytic cleavage event. 

1 5 Such additional processes include, by way of non-limiting example, glycosylation, 

myristoylation or phosphorylation. In general, a mature polypeptide or protein may result 
from the operation of only one of these processes, or a combination of any of them. 

The term "probes", as utilized herein, refers to nucleic acid sequences of variable 
length, preferably between at least about 1 0 nucleotides (nt), 1 00 nt, or as many as 

20 approximately, e.g., 6,000 nt, depending upon the specific use. Probes are used in the 

detection of identical, similar, or complementary nucleic acid sequences. Longer length 
probes are generally obtained from a natural or recombinant source, are highly specific, and 
much slower to hybridize than shorter-length oligomer probes. Probes may be single- or 
double-stranded and designed to have specificity in PCR, membrane-based hybridization 

25 technologies, or ELISA-like technologies. 

The term "isolated" nucleic acid molecule, as utilized herein, is one, which is separated 
from other nucleic acid molecules which are present in the natural source of the nucleic acid. 
Preferably, an "isolated" nucleic acid is free of sequences which naturally flank the nucleic 
acid (i.e., sequences located at the 5'- and 3'-termini of the nucleic acid) in the genomic DNA 

30 of the organism from which the nucleic acid is derived. For example, in various embodiments, 
the isolated NOVX nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 
kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic acid molecule in 
genomic DNA of the cell/tissue from which the nucleic acid is derived (e.g., brain, heart, liver, 
spleen, etc.). Moreover, an "isolated" nucleic acid molecule, such as a cDNA molecule, can 
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be substantially free of other cellular material or culture medium when produced by 
recombinant techniques, or of chemical precursors or other chemicals when chemically 
synthesized. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the 
5 nucleotide sequence SEQ IDNOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31, 
or a complement of this aforementioned nucleotide sequence, can be isolated using standard 
molecular biology techniques and the sequence information provided herein. Using all or a 
portion of the nucleic acid sequence of SEQ IDNOS:l,3, 5, 7, 9, 11, 13, 15, 17, 19,21,23, 
25, 27, 29, and 31 as a hybridization probe, NOVX molecules can be isolated using standard 

10 hybridization and cloning techniques (e.g., as described in Sambrook, et aL, (eds.), 

Molecular Cloning: A Laboratory Manual 2 nd Ed., Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989; and Ausubel, et al, (eds.), Current Protocols in 
Molecular Biology, John Wiley & Sons, New York, NY, 1993.) 

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 

1 5 genomic DNA, as a template and appropriate oligonucleotide primers according to standard 
PCR amplification techniques. The nucleic acid so amplified can be cloned into an 
appropriate vector and characterized by DNA sequence analysis. Furthermore, 
oligonucleotides corresponding to NOVX nucleotide sequences can be prepared by standard 
synthetic techniques, e.g., using an automated DNA synthesizer. 

20 As used herein, the term "oligonucleotide" refers to a series of linked nucleotide 

residues, which oligonucleotide has a sufficient number of nucleotide bases to be used in a 
PCR reaction. A short oligonucleotide sequence may be based on, or designed from, a 
genomic or cDNA sequence and is used to amplify, confirm, or reveal the presence of an 
identical, similar or complementary DNA or RNA in a particular cell or tissue. 

25 Oligonucleotides comprise portions of a nucleic acid sequence having about 10 nt, 50 nt, or 
100 nt in length, preferably about 15 nt to 30 nt in length. In one embodiment of the 
invention, an oligonucleotide comprising a nucleic acid molecule less than 1 00 nt in length 
would further comprise at least 6 contiguous nucleotides SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 15, 
17, 19, 21, 23, 25, 27, 29, and 31, or a complement thereof. Oligonucleotides may be 

30 chemically synthesized and may also be used as probes. 

In another embodiment, an isolated nucleic acid molecule of the invention comprises a 
nucleic acid molecule that is a complement of the nucleotide sequence shown in SEQ ID 
NOS:l,3, 5, 7, 9, 11, 13, 15, 17, 19,21,23,25, 27, 29, and 3 1 , or a portion of this nucleotide 
sequence (e.g., a fragment that can be used as a probe or primer or a fragment encoding a 
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biologically-active portion of an NOVX polypeptide). A nucleic acid molecule that is 
complementary to the nucleotide sequence shown SEQ ID NOS:l, 3, 5 5 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, or 31 is one that is sufficiently complementary to the nucleotide 
sequence shown SEQ ID NOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 19,21,23,25, 27, 29, or 31 that it 
5 can hydrogen bond with little or no mismatches to the nucleotide sequence shown SEQ ID 
NOS:l,3, 5, 7, 9, 11, 13, 15, 17, 19,21,23, 25,27, 29, and 3 1 , thereby forming a stable 
duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogsteen base 
pairing between nucleotides units of a nucleic acid molecule, and the term "binding" means 

10 the physical or chemical interaction between two polypeptides or compounds or associated 
polypeptides or compounds or combinations thereof. Binding includes ionic, non-ionic, van 
der Waals, hydrophobic interactions, and the like. A physical interaction can be either direct 
or indirect. Indirect interactions may be through or due to the effects of another polypeptide or 
compound. Direct binding refers to interactions that do not take place through, or due to, the 

15 effect of another polypeptide or compound, but instead are without other substantial chemical 
intermediates. 

Fragments provided herein are defined as sequences of at least 6 (contiguous) nucleic 
acids or at least 4 (contiguous) amino acids, a length sufficient to allow for specific 
hybridization in the case of nucleic acids or for specific recognition of an epitope in the case of 

20 amino acids, respectively, and are at most some portion less than a full length sequence. 
Fragments may be derived from any contiguous portion of a nucleic acid or amino acid 
sequence of choice. Derivatives are nucleic acid sequences or amino acid sequences formed 
from the native compounds either directly or by modification or partial substitution. Analogs 
are nucleic acid sequences or amino acid sequences that have a structure similar to, but not 

25 identical to, the native compound but differs from it in respect to certain components or side 
chains. Analogs may be synthetic or from a different evolutionary origin and may have a 
similar or opposite metabolic activity compared to wild type. Homologs are nucleic acid 
sequences or amino acid sequences of a particular gene that are derived from different species. 
Derivatives and analogs may be full length or other than full length, if the derivative or 

30 analog contains a modified nucleic acid or amino acid, as described below. Derivatives or 
analogs of the nucleic acids or proteins of the invention include, but are not limited to, 
molecules comprising regions that are substantially homologous to the nucleic acids or 
proteins of the invention, in various embodiments, by at least about 70%, 80%, or 95% 
identity (with a preferred identity of 80-95%) over a nucleic acid or amino acid sequence of 



identical size or when compared to an aligned sequence in which the alignment is done by a 
computer homology program known in the art, or whose encoding nucleic acid is capable of 
hybridizing to the complement of a sequence encoding the aforementioned proteins under 
stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et aL, CURRENT 
Protocols in Molecular Biology, John Wiley & Sons, New York, NY, 1993, and below. 

A "homologous nucleic acid sequence" or "homologous amino acid sequence," or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as discussed above. Homologous nucleotide sequences encode those 
sequences coding for isoforms of NOVX polypeptides. Isoforms can be expressed in different 
tissues of the same organism as a result of, for example, alternative splicing of RNA. 
Alternatively, isoforms can be encoded by different genes. In the invention, homologous 
nucleotide sequences include nucleotide sequences encoding for an NOVX polypeptide of 
species other than humans, including, but not limited to: vertebrates, and thus can include, e.g., 
frog, mouse, rat, rabbit, dog, cat cow, horse, and other organisms. Homologous nucleotide 
sequences also include, but are not limited to, naturally occurring allelic variations and 
mutations of the nucleotide sequences set forth herein. A homologous nucleotide sequence 
does not, however, include the exact nucleotide sequence encoding human NOVX protein. 
Homologous nucleic acid sequences include those nucleic acid sequences that encode 
conservative amino acid substitutions (see below) in SEQ ID NOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 
19, 21, 23, 25, 27, 29, and 3 1, as well as a polypeptide possessing NOVX biological activity. 
Various biological activities of the NOVX proteins are described below. 

An NOVX polypeptide is encoded by the open reading frame ("ORF") of an NOVX 
nucleic acid. An ORF corresponds to a nucleotide sequence that could potentially be translated 
into a polypeptide. A stretch of nucleic acids comprising an ORF is uninterrupted by a stop 
codon. An ORF that represents the coding sequence for a full protein begins with an ATG 
"start" codon and terminates with one of the three "stop" codons, namely, TAA, TAG, or 
TGA. For the purposes of this invention, an ORF may be any part of a coding sequence, with 
or without a start codon, a stop codon, or both. For an ORF to be considered as a good 
candidate for coding for a bona fide cellular protein, a minimum size requirement is often set, 
e.g., a stretch of DNA that would encode a protein of 50 amino acids or more. 

The nucleotide sequences determined from the cloning of the human NOVX genes 
allows for the generation of probes and primers designed for use in identifying and/or cloning 
NOVX homologues in other cell types, e.g. from other tissues, as well as NOVX homologues 
from other vertebrates. The probe/primer typically comprises substantially purified 
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oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 300, 350 
or 400 consecutive sense strand nucleotide sequence SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, or 31 ; or an anti-sense strand nucleotide sequence of SEQ ID NOSrl, 3, 
5 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, or 31; or of a naturally occurring mutant of SEQ 
ID NOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31. 

Probes based on the human NOVX nucleotide sequences can be used to detect 
transcripts or genomic sequences encoding the same or homologous proteins. In various 
embodiments, the probe further comprises a label group attached thereto, e.g. the label group 

10 can be a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such 

probes can be used as a part of a diagnostic test kit for identifying cells or tissues which mis- 
express an NOVX protein, such as by measuring a level of an NOVX-encoding nucleic acid in 
a sample of cells from a subject e.g., detecting NOVX mRNA levels or determining whether a 
genomic NOVX gene has been mutated or deleted. 

1 5 "A polypeptide having a biologically-active portion of an NOVX polypeptide" refers 

to polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 
polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a "biologically- 
active portion of NOVX" can be prepared by isolating a portion SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 

20 13, 15, 17, 19, 21, 23, 25, 27, 29, or 31, that encodes a polypeptide having an NOVX 

biological activity (the biological activities of the NOVX proteins are described below), 
expressing the encoded portion of NOVX protein (e.g., by recombinant expression in vitro) 
and assessing the activity of the encoded portion of NOVX. 

NOVX Nucleic Acid and Polypeptide Variants 

25 The invention further encompasses nucleic acid molecules that differ from the 

nucleotide sequences shown in SEQ ID NOSrl, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 
29, and 3 1 due to degeneracy of the genetic code and thus encode the same NOVX proteins as 
that encoded by the nucleotide sequences shown in SEQ ID NOSrl, 3, 5, 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, and 31. In another embodiment, an isolated nucleic acid molecule of 

30 the invention has a nucleotide sequence encoding a protein having an amino acid sequence 
shown in SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32. 

In addition to the human NOVX nucleotide sequences shown in SEQ ID NOS:l, 3, 5, 
7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31, it will be appreciated by those skilled in the 
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art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of the 
NOVX polypeptides may exist within a population (e.g. , the human population). Such genetic 
polymorphism in the NOVX genes may exist among individuals within a population due to 
natural allelic variation. As used herein, the terms "gene" and "recombinant gene" refer to 
5 nucleic acid molecules comprising an open reading frame (ORF) encoding an NOVX protein, 
preferably a vertebrate NOVX protein. Such natural allelic variations can typically result in 
1-5% variance in the nucleotide sequence of the NOVX genes. Any and all such nucleotide 
variations and resulting amino acid polymorphisms in the NOVX polypeptides, which are the 
result of natural allelic variation and that do not alter the functional activity of the NOVX 

10 polypeptides, are intended to be within the scope of the invention. 

Moreover, nucleic acid molecules encoding NOVX proteins from other species, and 
thus that have a nucleotide sequence that differs from the human SEQ ID NOS:l, 3, 5, 7, 9, 
11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31 are intended to be within the scope of the 
invention. Nucleic acid molecules corresponding to natural allelic variants and homologues of 

1 5 the NOVX cDNAs of the invention can be isolated based on their homology to the human 
NOVX nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a 
hybridization probe according to standard hybridization techniques under stringent 
hybridization conditions. 

Accordingly, in another embodiment, an isolated nucleic acid molecule of the 

20 invention is at least 6 nucleotides in length and hybridizes under stringent conditions to the 
nucleic acid molecule comprising the nucleotide sequence of SEQ ID NOS: 1, 3, 5, 7, 9, 1 1, 
13, 15, 17, 19, 21, 23, 25, 27, 29, and 31. In another embodiment, the nucleic acid is at least 
1 0, 25, 50, 1 00, 250, 500, 750, 1 000, 1 500, or 2000 or more nucleotides in length. In yet 
another embodiment, an isolated nucleic acid molecule of the invention hybridizes to the 

25 coding region. As used herein, the term "hybridizes under stringent conditions" is intended to 
describe conditions for hybridization and washing under which nucleotide sequences at least 
60% homologous to each other typically remain hybridized to each other. 

Homologs (i.e., nucleic acids encoding NOVX proteins derived from species other 
than human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or 

30 high stringency hybridization with all or a portion of the particular human sequence as a probe 
using methods well known in the art for nucleic acid hybridization and cloning. 

As used herein, the phrase "stringent hybridization conditions" refers to conditions 
under which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no 
other sequences. Stringent conditions are sequence-dependent and will be different in 
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different circumstances. Longer sequences hybridize specifically at higher temperatures than 
shorter sequences. Generally, stringent conditions are selected to be about 5 °C lower than the 
thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The 
Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at 
5 which 50% of the probes complementary to the target sequence hybridize to the target 

sequence at equilibrium. Since the target sequences are generally present at excess, at Tm, 
50% of the probes are occupied at equilibrium. Typically, stringent conditions will be those in 
which the salt concentration is less than about 1 .0 M sodium ion, typically about 0.01 to 1 .0 M 
sodium ion (or other salts) at 

10 pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes, primers or 

oligonucleotides (e.g., 10 nt to 50 nt) and at least about 60°C for longer probes, primers and 
oligonucleotides. Stringent conditions may also be achieved with the addition of destabilizing 
agents, such as formamide. 

Stringent conditions are known to those skilled in the art and can be found in Ausubel, 

15 et al, (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. 

(1989), 6.3.1-6.3.6. Preferably, the conditions are such that sequences at least about 65%, 
70%, 75%, 85%, 90%, 95%, 98%, or 99% homologous to each other typically remain 
hybridized to each other. A non-limiting example of stringent hybridization conditions are 
hybridization in a high salt buffer comprising 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM 

20 EDTA, 0.02%> PVP, 0.02% Ficoll, 0.02% BSA, and 500 mg/ml denatured salmon sperm DNA 
at 65°C, followed by one or more washes in 0.2X SSC, 0.01% BSA at 50°C. An isolated 
nucleic acid molecule of the invention that hybridizes under stringent conditions to the 
sequences SEQ IDNOSrl, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31, 
corresponds to a naturally-occurring nucleic acid molecule. As used herein, a 

25 "naturally-occurring" nucleic acid molecule refers to an RNA or DNA molecule having a 
nucleotide sequence that occurs in nature (e.g., encodes a natural protein). 

In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic 
acid molecule comprising the nucleotide sequence of SEQ ID NOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 
19, 21, 23, 25, 27, 29, and 31, or fragments, analogs or derivatives thereof, under conditions of 

30 moderate stringency is provided. A non-limiting example of moderate stringency 

hybridization conditions are hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 
100 mg/ml denatured salmon sperm DNA at 55°C, followed by one or more washes in 
IX SSC, 0.1% SDS at 37°C. Other conditions of moderate stringency that may be used are 
well-known within the art. See, e.g., Ausubel, et al (eds.), 1993, Current Protocols in 
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Molecular Biology, John Wiley & Sons, NY, and Kriegler, 1990; Gene Transfer and 
Expression, A Laboratory Manual, Stockton Press, NY. 

In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 
comprising the nucleotide sequences SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 
5 27, 29, and 3 1 , or fragments, analogs or derivatives thereof, under conditions of low 

stringency, is provided. A non-limiting example of low stringency hybridization conditions 
are hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 
0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% 
(wt/vol) dextran sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCl 

10 (pH 7.4), 5 mM EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may 
be used are well known in the art (e.g., as employed for cross-species hybridizations). See, 
e.g., Ausubel, et al (eds.), 1993, Current Protocols in Molecular Biology, John Wiley 
& Sons, NY, and Kriegler, 1990, Gene Transfer and Expression, A Laboratory 
Manual, Stockton Press, NY; Shilo and Weinberg, 1981. Proc Natl Acad Sci USA 78: 

15 6789-6792. 



Conservative Mutations 

In addition to naturally-occurring allelic variants of NOVX sequences that may exist in 
the population, the skilled artisan will further appreciate that changes can be introduced by 

20 mutation into the nucleotide sequences SEQ ID NOSrl, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 
25, 27, 29, and 31, thereby leading to changes in the amino acid sequences of the encoded 
NOVX proteins, without altering the functional ability of said NOVX proteins. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 
residues can be made in the sequence SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 

25 26, 28, 30, or 32. A "non-essential" amino acid residue is a residue that can be altered from 
the wild-type sequences of the NOVX proteins without altering their biological activity, 
whereas an "essential" amino acid residue is required for such biological activity. For 
example, amino acid residues that are conserved among the NOVX proteins of the invention 
are predicted to be particularly non-amenable to alteration. Amino acids for which 

30 conservative substitutions can be made are well-known within the art. 

Another aspect of the invention pertains to nucleic acid molecules encoding NOVX 
proteins that contain changes in amino acid residues that are not essential for activity. Such 
NOVX proteins differ in amino acid sequence from SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, and 31 yet retain biological activity. In one embodiment, the isolated 
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nucleic acid molecule comprises a nucleotide sequence encoding a protein, wherein the protein 
comprises an amino acid sequence at least about 45% homologous to the amino acid 
sequences SEQ IDNOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and 32. 
Preferably, the protein encoded by the nucleic acid molecule is at least about 60% homologous 
5 to SEQ ID NOS.2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and 32; more preferably at 
least about 70% homologous SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 
or 32; still more preferably at least about 80% homologous to SEQ ID NOS:2, 4, 6, 8, 10, 12, 
14, 16, 18, 20, 22, 24, 26, 28, 30, or 32; even more preferably at least about 90% homologous 
to SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32; and most preferably 
10 at least about 95% homologous to SEQ IDNOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 
28, 30, or 32. 

An isolated nucleic acid molecule encoding an NOVX protein homologous to the 
protein of SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32 can be 
created by introducing one or more nucleotide substitutions, additions or deletions into the 
15 nucleotidesequenceof SEQIDNOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 
3 1 , such that one or more amino acid substitutions, additions or deletions are introduced into 
the encoded protein. 

Mutations can be introduced into SEQ ID NOS:l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 
25, 27, 29, and 31 by standard techniques, such as site-directed mutagenesis and 

20 PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at 
one or more predicted, non-essential amino acid residues. A "conservative amino acid 
substitution" is one in which the amino acid residue is replaced with an amino acid residue 
having a similar side chain. Families of amino acid residues having similar side chains have 
been defined within the art. These families include amino acids with basic side chains (e.g., 

25 lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged 
polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), 
nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and 
aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted 

30 non-essential amino acid residue in the NOVX protein is replaced with another amino acid 

residue from the same side chain family. Alternatively, in another embodiment, mutations can 
be introduced randomly along all or part of an NOVX coding sequence, such as by saturation 
mutagenesis, and the resultant mutants can be screened for NOVX biological activity to 
identify mutants that retain activity. Following mutagenesis SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 
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15, 17, 19, 21, 23, 25, 27, 29, and 31, the encoded protein can be expressed by any 
recombinant technology known in the art and the activity of the protein can be determined. 

The relatedness of amino acid families may also be determined based on side chain 
interactions. Substituted amino acids may be fully conserved "strong" residues or fully 
5 conserved "weak" residues. The "strong" group of conserved amino acid residues may be any 
one of the following groups: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW, 
wherein the single letter amino acid codes are grouped by those amino acids that may be 
substituted for each other. Likewise, the "weak" group of conserved residues may be any one 
of the following: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, 
1 0 VLIM, HF Y, wherein the letters within each group represent the single letter amino acid code. 

In one embodiment, a mutant NOVX protein can be assayed for (/) the ability to form 
protein:protein interactions with other NOVX proteins, other cell-surface proteins, or 
biologically-active portions thereof, (if) complex formation between a mutant NOVX protein 
and an NOVX ligand; or (iif) the ability of a mutant NOVX protein to bind to an intracellular 
15 target protein or biologically-active portion thereof; (e.g. avidin proteins). 

In yet another embodiment, a mutant NOVX protein can be assayed for the ability to 
regulate a specific biological function (e.g., regulation of insulin release). 

Antisense Nucleic Acids 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules 
20 that are hybridizable to or complementary to the nucleic acid molecule comprising the 

nucleotide sequence ofSEQ IDNOS:l,3, 5, 7, 9, 1 1, 13, 15, 17, 19,21,23,25,27, 29, and 
31, or fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a 
nucleotide sequence that is complementary to a "sense" nucleic acid encoding a protein (e.g., 
complementary to the coding strand of a double-stranded cDNA molecule or complementary 
25 to an mRNA sequence). In specific aspects, antisense nucleic acid molecules are provided that 
comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 nucleotides 
or an entire NOVX coding strand, or to only a portion thereof. Nucleic acid molecules 
encoding fragments, homologs, derivatives and analogs of an NOVX protein of SEQ ID 
NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32, or antisense nucleic acids 
30 complementary to an NOVX nucleic acid sequence of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 
17, 19, 21, 23, 25, 27, 29, and 31, are additionally provided. 

In one embodiment, an antisense nucleic acid molecule is antisense to a "coding 
region" of the coding strand of a nucleotide sequence encoding an NOVX protein. The term 
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"coding region" refers to the region of the nucleotide sequence comprising codons which are 
translated into amino acid residues. In another embodiment, the antisense nucleic acid 
molecule is antisense to a "noncoding region" of the coding strand of a nucleotide sequence 
encoding the NOVX protein. The term "noncoding region" refers to 5' and 3* sequences which 
5 flank the coding region that are not translated into amino acids (i.e. , also referred to as 5' and 
3 } untranslated regions). 

Given the coding strand sequences encoding the NOVX protein disclosed herein, 
antisense nucleic acids of the invention can be designed according to the rules of Watson and 
Crick or Hoogsteen base pairing. The antisense nucleic acid molecule can be complementary 

10 to the entire coding region of NOVX mRNA, but more preferably is an oligonucleotide that is 
antisense to only a portion of the coding or noncoding region of NOVX mRNA. For example, 
the antisense oligonucleotide can be complementary to the region surrounding the translation 
start site of NOVX mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 
20, 25, 30, 35, 40, 45 or 50 nucleotides in length. An antisense nucleic acid of the invention 

1 5 can be constructed using chemical synthesis or enzymatic ligation reactions using procedures 
known in the art. For example, an antisense nucleic acid (e.g., an antisense oligonucleotide) 
can be chemically synthesized using naturally-occurring nucleotides or variously modified 
nucleotides designed to increase the biological stability of the molecules or to increase the 
physical stability of the duplex formed between the antisense and sense nucleic acids (e.g., 

20 phosphorothioate derivatives and acridine substituted nucleotides can be used). 

Examples of modified nucleotides that can be used to generate the antisense nucleic 
acid include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, 
xanthine, 4-acetylcytosine, S-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl- 
2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, 

25 inosine, N6-isopentenyladenine, 1 -methyl guanine, 1 -methylinosine, 2,2-dimethyl guanine, 
2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 
7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, 
beta-D-mannosylqueosine, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 

2- methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
30 queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, 

uracil-5-oxy acetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 

3- (3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 
nucleic acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the 
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inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 

The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
5 genomic DNA encoding an NOVX protein to thereby inhibit expression of the protein (e.g. , by 
inhibiting transcription and/or translation). The hybridization can be by conventional 
nucleotide complementarity to form a stable duplex, or, for example, in the case of an 
antisense nucleic acid molecule that binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a route of administration of antisense 

10 nucleic acid molecules of the invention includes direct injection at a tissue site. Alternatively, 
antisense nucleic acid molecules can be modified to target selected cells and then administered 
systemically. For example, for systemic administration, antisense molecules can be modified 
such that they specifically bind to receptors or antigens expressed on a selected cell surface 
(e.g., by linking the antisense nucleic acid molecules to peptides or antibodies that bind to cell 

15 surface receptors or antigens). The antisense nucleic acid molecules can also be delivered to 
cells using the vectors described herein. To achieve sufficient nucleic acid molecules, vector 
constructs in which the antisense nucleic acid molecule is placed under the control of a strong 
pol II or pol III promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 

20 a-anomeric nucleic acid molecule. An ot-anomeric nucleic acid molecule forms specific 

double-stranded hybrids with complementary RNA in which, contrary to the usual p-units, the 
strands run parallel to each other. See, e.g., Gaultier, et aL, 1987. Nucl. Acids Res. 15: 
6625-6641. The antisense nucleic acid molecule can also comprise a 

2-o-methylribonucleotide (See, e.g., Inoue, et ah 1987. Nucl. Acids Res. 15: 6131-6148) or a 
25 chimeric RNA-DNA analogue (See, e.g., Inoue, et al, 1987. FEBS Lett. 215: 327-330. 



Ribozymes and PNA Moieties 

Nucleic acid modifications include, by way of non-limiting example, modified bases, 
and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
30 modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. 

In one embodiment, an antisense nucleic acid of the invention is a ribozyme. 
Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of 
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cleaving a single-stranded nucleic acid, such as an mRNA, to which they have a 
complementary region. Thus, ribozymes (e.g., hammerhead ribozymes as described in 
Haselhoff and Gerlach 1988. Nature 334: 585-591) can be used to catalytically cleave NOVX 
mRNA transcripts to thereby inhibit translation of NOVX mRNA. A ribozyme having 
5 specificity for an NOVX-encoding nucleic acid can be designed based upon the nucleotide 
sequence of an NOVX cDNA disclosed herein (i.e., SEQ ID NOS:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, and 31). For example, a derivative of a Tetrahymena L-19 IVS RNA 
can be constructed in which the nucleotide sequence of the active site is complementary to the 
nucleotide sequence to be cleaved in an NOVX-encoding mRNA. See, e.g., U.S. Patent 

10 4,987,071 to Cech, et al and U.S. Patent 5,1 16,742 to Cech, et al NOVX mRNA can also be 
used to select a catalytic RNA having a specific ribonuclease activity from a pool of RNA 
molecules. See, e.g., Battel et al, (1993) Science 261:1411-141 8. 

Alternatively, NOVX gene expression can be inhibited by targeting nucleotide 
sequences complementary to the regulatory region of the NOVX nucleic acid (e.g., the NOVX 

1 5 promoter and/or enhancers) to form triple helical structures that prevent transcription of the 

NOVX gene in target cells. See, e.g., Helene, 1991 . Anticancer DrugDes. 6: 569-84; Helene, 
et al. 1992. Ann. N. Y. Acad. Sci. 660: 27-36; Maher, 1992. Bioassays 14: 807-15. 

In various embodiments, the NOVX nucleic acids can be modified at the base moiety, 
sugar moiety or phosphate backbone to improve, eg., the stability, hybridization, or solubility 

20 of the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can 
be modified to generate peptide nucleic acids. See, e.g., Hyrup, et ah, 1996. Bioorg Med 
Chem 4: 5-23. As used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic 
acid mimics (e.g., DNA mimics) in which the deoxyribose phosphate backbone is replaced by 
a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral 

25 backbone of PNAs has been shown to allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA oligomers can be performed using 
standard solid phase peptide synthesis protocols as described in Hyrup, et al, 1996. supra; 
Perry-O'Keefe, et al, 1996. Proc. Natl. Acad. Sci. USA 93: 14670-14675. 

PNAs of NOVX can be used in therapeutic and diagnostic applications. For example, 

30 PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 

expression by, e.g., inducing transcription or translation arrest or inhibiting replication. PNAs 
of NOVX can also be used, for example, in the analysis of single base pair mutations in a gene 
(e.g., PNA directed PCR clamping; as artificial restriction enzymes when used in combination 
with other enzymes, e.g., Si nucleases (See, Hyrup, et al, \996.suprd); or as probes or primers 
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for DNA sequence and hybridization (See, Hyrup, et aL, 1996, supra; Perry-O'Keefe, et aL, 
1996. supra). 

In another embodiment, PNAs of NOVX can be modified, e.g., to enhance their 
stability or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the 
5 formation of PNA-DNA chimeras, or by the use of liposomes or other techniques of drug 

delivery known in the art. For example, PNA-DNA chimeras of NOVX can be generated that 
may combine the advantageous properties of PNA and DNA. Such chimeras allow DNA 
recognition enzymes (e.g., RNase H and DNA polymerases) to interact with the DNA portion 
while the PNA portion would provide high binding affinity and specificity. PNA-DNA 

1 0 chimeras can be linked using linkers of appropriate lengths selected in terms of base stacking, 
number of bonds between the nucleobases, and orientation (see, Hyrup, et aL, 1996. supra). 
The synthesis of PNA-DNA chimeras can be performed as described in Hyrup, et al, 1996. 
supra and Finn, et al, 1996. Nucl Acids Res 24: 3357-3363, For example, a DNA chain can 
be synthesized on a solid support using standard phosphoramidite coupling chemistry, and 

15 modified nucleoside analogs, e.g., 5 f -(4-methoxytrityl)amino-5'-deoxy-thymidine 

phosphoramidite, can be used between the PNA and the 5' end of DNA. See, e.g, Mag, et aL, 
1989. Nucl Acid Res 17: 5973-5988. PNA monomers are then coupled in a stepwise manner 
to produce a chimeric molecule with a 5* PNA segment and a 3' DNA segment. See, e.g., 
Finn, et al., 1996. supra. Alternatively, chimeric molecules can be synthesized with a 5' DNA 

20 segment and a 3' PNA segment. See, e.g., Petersen, et aL, 1975. Bioorg. Med. Chem. Lett. 5: 
1119-11124. 

In other embodiments, the oligonucleotide may include other appended groups such as 
peptides (e.g, for targeting host cell receptors in vivo), or agents facilitating transport across 
the cell membrane (see, e.g., Letsinger, et al., 1989. Proc. Natl Acad. Sci. U.S.A. 86: 

25 6553-6556; Lemaitre, et al, 1987. Proc. Natl Acad. Sci. 84: 648-652; PCT Publication No. 
WO88/09810)orthe blood-brain barrier (see, e.g. , PCT Publication No. WO 89/10134), In 
addition, oligonucleotides can be modified with hybridization triggered cleavage agents (see, 
e.g., Krol, et aL, 1988. BioTechniques 6:958-976) or intercalating agents (see, e.g, Zon, 1988. 
Pharm. Res. 5: 539-549). To this end, the oligonucleotide may be conjugated to another 

30 molecule, e.g, a peptide, a hybridization triggered cross-linking agent, a transport agent, a 
hybridization-triggered cleavage agent, and the like. 
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NOVX Polypeptides 

A polypeptide according to the invention includes a polypeptide including the amino 
acid sequence of NOVX polypeptides whose sequences are provided in SEQ ID NOS:2, 4, 6, 
8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32. The invention also includes a mutant or 
5 variant protein any of whose residues may be changed from the corresponding residues shown 
in SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32 while still encoding 
a protein that maintains its NOVX activities and physiological functions, or a functional 
fragment thereof. 

In general, an NOVX variant that preserves NOVX-like function includes any variant 

10 in which residues at a particular position in the sequence have been substituted by other amino 
acids, and further include the possibility of inserting an additional residue or residues between 
two residues of the parent protein as well as the possibility of deleting one or more residues 
from the parent sequence. Any amino acid substitution, insertion, or deletion is encompassed 
by the invention. In favorable circumstances, the substitution is a conservative substitution as 

1 5 defined above. 

One aspect of the invention pertains to isolated NOVX proteins, and biologically- 
active portions thereof, or derivatives, fragments, analogs or homologs thereof. Also provided 
are polypeptide fragments suitable for use as immunogens to raise anti-NOVX antibodies. In 
one embodiment, native NOVX proteins can be isolated from cells or tissue sources by an 

20 appropriate purification scheme using standard protein purification techniques. In another 
embodiment, NOVX proteins are produced by recombinant DNA techniques. Alternative to 
recombinant expression, an NOVX protein or polypeptide can be synthesized chemically 
using standard peptide synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion thereof 

25 is substantially free of cellular material or other contaminating proteins from the cell or tissue 
source from which the NOVX protein is derived, or substantially free from chemical 
precursors or other chemicals when chemically synthesized. The language "substantially free 
of cellular material" includes preparations of NOVX proteins in which the protein is separated 
from cellular components of the cells from which it is isolated or recombinant ly-produced. In 

30 one embodiment, the language "substantially free of cellular material" includes preparations of 
NOVX proteins having less than about 30% (by dry weight) of non-NOVX proteins (also 
referred to herein as a "contaminating protein"), more preferably less than about 20% of 
non-NOVX proteins, still more preferably less than about 1 0% of non-NOVX proteins, and 
most preferably less than about 5% of non-NOVX proteins. When the NOVX protein or 



biologically-active portion thereof is recombinantly-produced, it is also preferably 
substantially free of culture medium, i.e., culture medium represents less than about 20%, 
more preferably less than about 10%, and most preferably less than about 5% of the volume of 
the NOVX protein preparation. 

The language "substantially free of chemical precursors or other chemicals" includes 
preparations of NOVX proteins in which the protein is separated from chemical precursors or 
other chemicals that are involved in the synthesis of the protein. In one embodiment, the 
language "substantially free of chemical precursors or other chemicals" includes preparations 
of NOVX proteins having less than about 30% (by dry weight) of chemical precursors or 
non-NOVX chemicals, more preferably less than about 20% chemical precursors or 
non-NOVX chemicals, still more preferably less than about 10% chemical precursors or 
non-NOVX chemicals, and most preferably less than about 5% chemical precursors or 
non-NOVX chemicals. 

Biologically-active portions of NOVX proteins include peptides comprising amino 
acid sequences sufficiently homologous to or derived from the amino acid sequences of the 
NOVX proteins (e.g., the amino acid sequence shown in SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 
16, 18, 20, 22, 24, 26, 28, 30, or 32) that include fewer amino acids than the full-length NOVX 
proteins, and exhibit at least one activity of an NOVX protein. Typically, biologically-active 
portions comprise a domain or motif with at least one activity of the NOVX protein. A 
biologically-active portion of an NOVX protein can be a polypeptide which is, for example, 
10, 25, 50, 100 or more amino acid residues in length. 

Moreover, other biologically-active portions, in which other regions of the protein are 
deleted, can be prepared by recombinant techniques and evaluated for one or more of the 
functional activities of a native NOVX protein. 

In an embodiment, the NOVX protein has an amino acid sequence shown SEQ ID 
NOS:2,4,6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32. In other embodiments, the 
NOVX protein is substantially homologous to SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 
22, 24, 26, 28, 30, or 32, and retains the functional activity of the protein of SEQ ID NOS:2, 4, 
6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32, yet differs in amino acid sequence due to 
natural allelic variation or mutagenesis, as described in detail, below. Accordingly, in another 
embodiment, the NOVX protein is a protein that comprises an amino acid sequence at least 
about 45% homologous to the amino acid sequence SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30, or 32, and retains the functional activity of the NOVX proteins of SEQ 
ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32. 
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Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic 
acids, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced 
in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a 
second amino or nucleic acid sequence). The amino acid residues or nucleotides at 
corresponding amino acid positions or nucleotide positions are then compared. When a 
position in the first sequence is occupied by the same amino acid residue or nucleotide as the 
corresponding position in the second sequence, then the molecules are homologous at that 
position (i.e., as used herein amino acid or nucleic acid "homology" is equivalent to amino 
acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity 
between two sequences. The homology may be determined using computer programs known 
in the art, such as GAP software provided in the GCG program package. See, Needleman and 
Wunsch, 1970. J Mol Biol 48: 443-453. Using GCG GAP software with the following settings 
for nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP extension 
penalty of 0.3, the coding region of the analogous nucleic acid sequences referred to above 
exhibits a degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 
99%, with the CDS (encoding) part of the DNA sequence shown in SEQ ID NOS.T, 3, 5, 7, 9, 
11, 13, 15, 17, 19,21,23,25,27, 29, and 31. 

The term "sequence identity" refers to the degree to which two polynucleotide or 
polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity" is calculated by comparing two 
optimally aligned sequences over that region of comparison, determining the number of 
positions at which the identical nucleic acid base {e.g., A, T, C, G, U, or I, in the case of 
nucleic acids) occurs in both sequences to yield the number of matched positions, dividing the 
number of matched positions by the total number of positions in the region of comparison (i.e., 
the window size), and multiplying the result by 1 00 to yield the percentage of sequence 
identity. The term "substantial identity" as used herein denotes a characteristic of a 
polynucleotide sequence, wherein the polynucleotide comprises a sequence that has at least 80 
percent sequence identity, preferably at least 85 percent identity and often 90 to 95 percent 
sequence identity, more usually at least 99 percent sequence identity as compared to a 
reference sequence over a comparison region. 
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Chimeric and Fusion Proteins 

The invention also provides NOVX chimeric or fusion proteins. As used herein, an 
NOVX "chimeric protein" or "fusion protein" comprises an NOVX polypeptide operatively- 
5 linked to a non-NOVX polypeptide. An "NOVX polypeptide" refers to a polypeptide having 
an amino acid sequence corresponding to an NOVX protein SEQ ID NOS:2, 4, 6, 8, 10, 12, 
14, 16, 18, 20, 22, 24, 26, 28, 30, or 32, whereas a "non-NOVX polypeptide" refers to a 
polypeptide having an amino acid sequence corresponding to a protein that is not substantially 
homologous to the NOVX protein, e.g., a protein that is different from the NOVX protein and 

1 0 that is derived from the same or a different organism. Within an NOVX fusion protein the 
NOVX polypeptide can correspond to all or a portion of an NOVX protein. In one 
embodiment, an NOVX fusion protein comprises at least one biologically-active portion of an 
NOVX protein. In another embodiment, an NOVX fusion protein comprises at least two 
biologically-active portions of an NOVX protein. In yet another embodiment, an NOVX 

1 5 fusion protein comprises at least three biologically-active portions of an NOVX protein. 

Within the fusion protein, the term "operatively-linked" is intended to indicate that the NOVX 
polypeptide and the non-NOVX polypeptide are fused in-frame with one another. The 
non-NOVX polypeptide can be fused to the N-terminus or C-terminus of the NOVX 
polypeptide. 

20 In one embodiment, the fusion protein is a GST-NOVX fusion protein in which the 

NOVX sequences are fused to the C-terminus of the GST (glutathione S-transferase) 
sequences. Such fusion proteins can facilitate the purification of recombinant NOVX 
polypeptides. 

In another embodiment, the fusion protein is an NOVX protein containing a 
25 heterologous signal sequence at its N-terminus. In certain host cells (e.g., mammalian host 
cells), expression and/or secretion of NOVX can be increased through use of a heterologous 
signal sequence. 

In yet another embodiment, the fusion protein is an NOVX-immunoglobulin fusion 
protein in which the NOVX sequences are fused to sequences derived from a member of the 
30 immunoglobulin protein family. The NOVX-immunoglobulin fusion proteins of the invention 
can be incorporated into pharmaceutical compositions and administered to a subject to inhibit 
an interaction between an NOVX ligand and an NOVX protein on the surface of a cell, to 
thereby suppress NOVX-mediated signal transduction in vivo. The NOVX-immunoglobulin 
fusion proteins can be used to affect the bioavailability of an NOVX cognate ligand. 
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Inhibition of the NOVX ligand/NOVX interaction may be useful therapeutically for both the 
treatment of proliferative and differentiative disorders, as well as modulating (e.g. promoting 
or inhibiting) cell survival. Moreover, the NOVX-immunoglobulin fusion proteins of the 
invention can be used as immunogens to produce anti-NOVX antibodies in a subject, to purify 
5 NOVX ligands, and in screening assays to identify molecules that inhibit the interaction of 
NOVX with an NOVX ligand. 

An NOVX chimeric or fusion protein of the invention can be produced by standard 
recombinant DNA techniques. For example, DNA fragments coding for the different 
polypeptide sequences are ligated together in- frame in accordance with conventional 

10 techniques, e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction 

enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, 
alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In 
another embodiment, the fusion gene can be synthesized by conventional techniques including 
automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be 

1 5 carried out using anchor primers that give rise to complementary overhangs between two 

consecutive gene fragments that can subsequently be annealed and reamplified to generate a 
chimeric gene sequence (see, e.g., Ausubel, et aL (eds.) Current Protocols in Molecular 
Biology, John Wiley & Sons, 1992). Moreover, many expression vectors are commercially 
available that already encode a fusion moiety (e.g., a GST polypeptide). An NOVX-encoding 

20 nucleic acid can be cloned into such an expression vector such that the fusion moiety is linked 
in-frame to the NOVX protein. 

NOVX Agonists and Antagonists 

The invention also pertains to variants of the NOVX proteins that function as either 
25 NOVX agonists (i.e., mimetics) or as NOVX antagonists. Variants of the NOVX protein can 
be generated by mutagenesis (e.g., discrete point mutation or truncation of the NOVX protein). 
An agonist of the NOVX protein can retain substantially the same, or a subset of, the 
biological activities of the naturally occurring form of the NOVX protein. An antagonist of 
the NOVX protein can inhibit one or more of the activities of the naturally occurring form of 
30 the NOVX protein by, for example, competitively binding to a downstream or upstream 
member of a cellular signaling cascade which includes the NOVX protein. Thus, specific 
biological effects can be elicited by treatment with a variant of limited function. In one 
embodiment, treatment of a subject with a variant having a subset of the biological activities 
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of the naturally occurring form of the protein has fewer side effects in a subject relative to 
treatment with the naturally occurring form of the NOVX proteins. 

Variants of the NOVX proteins that function as either NOVX agonists (i.e., mimetics) 
or as NOVX antagonists can be identified by screening combinatorial libraries of mutants 
5 (e.g. , truncation mutants) of the NOVX proteins for NOVX protein agonist or antagonist 
activity. In one embodiment, a variegated library of NOVX variants is generated by 
combinatorial mutagenesis at the nucleic acid level and is encoded by a variegated gene 
library. A variegated library of NOVX variants can be produced by, for example, 
enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a 

10 degenerate set of potential NOVX sequences is expressible as individual polypeptides, or 
alternatively, as a set of larger fusion proteins (e.g., for phage display) containing the set of 
NOVX sequences therein. There are a variety of methods which can be used to produce 
libraries of potential NOVX variants from a degenerate oligonucleotide sequence. Chemical 
synthesis of a degenerate gene sequence can be performed in an automatic DNA synthesizer, 

1 5 and the synthetic gene then ligated into an appropriate expression vector. Use of a degenerate 
set of genes allows for the provision, in one mixture, of all of the sequences encoding the 
desired set of potential NOVX sequences. Methods for synthesizing degenerate 
oligonucleotides are well-known within the art. See, e.g., Narang, 1983. Tetrahedron 39: 3; 
Itakura, etaL, 1984. Annu. Rev. Biochem. 53: 323; Itakura, etaL, 1984. Science 198: 1056; 

20 Ike, et aU 1 983. Nucl Acids Res. 1 1 : 477. 

Polypeptide Libraries 

In addition, libraries of fragments of the NOVX protein coding sequences can be used 
to generate a variegated population of NOVX fragments for screening and subsequent 

25 selection of variants of an NOVX protein. In one embodiment, a library of coding sequence 
fragments can be generated by treating a double stranded PCR fragment of an NOVX coding 
sequence with a nuclease under conditions wherein nicking occurs only about once per 
molecule, denaturing the double stranded DNA, renaturing the DNA to form double-stranded 
DNA that can include sense/antisense pairs from different nicked products, removing single 

30 stranded portions from reformed duplexes by treatment with Si nuclease, and ligating the 

resulting fragment library into an expression vector. By this method, expression libraries can 
be derived which encodes N-terminal and internal fragments of various sizes of the NOVX 
proteins. 
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Various techniques are known in the art for screening gene products of combinatorial 
libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 
gene libraries generated by the combinatorial mutagenesis of NOVX proteins. The most 
5 widely used techniques, which are amenable to high throughput analysis, for screening large 
gene libraries typically include cloning the gene library into replicable expression vectors, 
transforming appropriate cells with the resulting library of vectors, and expressing the 
combinatorial genes under conditions in which detection of a desired activity facilitates 
isolation of the vector encoding the gene whose product was detected. Recursive ensemble 
10 mutagenesis (REM), a new technique that enhances the frequency of functional mutants in the 
libraries, can be used in combination with the screening assays to identify NOVX variants. 
See, e.g., Arkin and Yourvan, 1992. Proa Natl. Acad Sci. USA 89: 781 1-7815; Delgrave, et 
al. y 1993. Protein Engineering 6:327-33 1. 

Anti-NOVX Antibodies 

1 5 Also included in the invention are antibodies to NOVX proteins, or fragments of 

NOVX proteins. The term "antibody" as used herein refers to immunoglobulin molecules and 
immunologically active portions of immunoglobulin (Ig) molecules, i.e., molecules that 
contain an antigen binding site that specifically binds (immunoreacts with) an antigen. Such 
antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, F a b, 

20 Fab* and F( a b«>2 fragments, and an F a b expression library. In general, an antibody molecule 

obtained from humans relates to any of the classes IgG, IgM, IgA, IgE and IgD, which differ 
from one another by the nature of the heavy chain present in the molecule. Certain classes 
have subclasses as well, such as IgGj, IgG2 5 and others. Furthermore, in humans, the light 
chain may be a kappa chain or a lambda chain. Reference herein to antibodies includes a 

25 reference to all such classes, subclasses and types of human antibody species. 

An isolated NOVX-related protein of the invention may be intended to serve as an 
antigen, or a portion or fragment thereof, and additionally can be used as an immunogen to 
generate antibodies that immunospecifically bind the antigen, using standard techniques for 
polyclonal and monoclonal antibody preparation. The full-length protein can be used or, 

30 alternatively, the invention provides antigenic peptide fragments of the antigen for use as 

immunogens. An antigenic peptide fragment comprises at least 6 amino acid residues of the 
amino acid sequence of the full length protein and encompasses an epitope thereof such that an 
antibody raised against the peptide forms a specific immune complex with the full length 



